How to submit forecasts

Forecasts will be made in two stages. First, the training data should be used by each team to develop and select the optimal model for each prediction target and location: San Juan, Puerto Rico and Iquitos, Peru. Once this has been accomplished, the team should write a brief description of the model and data used. If different models are used for different targets or locations, each model should be described. The team should also prepare forecasts for the years 2005-2009 using the selected model(s). For each of these four transmission seasons, forecasts should be made every 4 weeks (weeks 0, 4, 8, 12, 16, 20, 24, 28, 32, 36, 40, 44, 48) for each target. Each forecast should include a point estimate and a probability distribution. Note that forecasts should be made for peak incidence even after peak incidence has occurred. These forecasts reflect the probability that peak incidence has already occurred (e.g. late season forecasts should be non-zero if there is some chance of a second, higher peak).

One “csv” file should be prepared for each location and each target using the supplied templates (Appendix 2). The initial model description and forecasts should be submitted to predict@cdc.gov by August 12, 2015. These forecasts will be used to verify that the format is correct and to provide metrics on fit to the training data.

All teams with verified submissions by August 12, 2015, will receive the testing data by email in the same format as the training data on August 19, 2015. They will have two weeks to submit forecasts for the 2009-2013 testing period using the already selected model. These forecasts should use exactly the same model and same format as the first submission and must be submitted to predict@cdc.gov by September 2, 2015.

IMPORTANT NOTE: Much of the data for 2009-2013 is currently accessible to researchers; it is therefore contingent upon the researcher to NOT use these data for model development or evaluation. The data are supplied only for “future” forecasts within the testing period. For example, forecasts made for the 2011/2012 season at Week 4 may use data from any date up to Week 4 of the 2011/2012 season, but no data of any type from later dates. The data may be used to dynamically update coefficients or covariates, but there should be no structural changes to the model and no consideration of data from Week 5 of that season or later.

San Juan, Puerto Rico - Training templates

Peak Week Peak Incidence Season Incidence

Iquitos, Peru - Training templates

Peak Week Peak Incidence Season Incidence

Model description

Once model development has been finished, each team should select their best model for future forecasts. Note again that there may be different models for different targets and locations, but only one for each target and location (though that may be an ensemble model). If different models are selected for different targets/locations, the description should include each of those models. The description should include the following components:

  1. Team name: This should match the name used in the submission file names.
  2. Team members: List every person involved with the forecasting effort and their institution. Include the email address of the team leader.
  3. Agreement: Include the following statement: “By submitting these forecasts, I (we) indicate my (our) full and unconditional agreement to abide by the project's official rules and data use agreements.”
  4. Model description: Is the model mechanistic, statistical? Is it an instance of a known class of models? The description should include sufficient detail for another modeler to understand the approach being applied. It may include equations, but that is not necessary. If multiple models are used, describe each model and which target each model was used to predict.
  5. Variables: What data is used in the model? Historical dengue data? Weather data? Other data? List every variable used and its temporal relationship to the forecast (e.g. lag or order of autocorrelation). If multiple models are used specify which variables enter into each model.
  6. Computational resources: What programming languages/software tools were used to write and execute the forecasts?
  7. Publications: Does the model derive directly from previously published work? If so please include references.