State FluSight 2017-18

State FluSight: Seasonal Influenza Forecasting for US States

Influenza (flu) is a respiratory virus that can result in illness ranging from mild to severe. Each year, millions of people get sick with influenza, hundreds of thousands are hospitalized and thousands of people die from flu. Tracking flu activity to inform prevention measures is an important public health function that is currently performed by CDC’s flu surveillance system, which can lag behind real-time flu activity. But what if it were possible to predict flu activity accurately weeks or months in advance for multiple locations? While this is not currently possible, the goal of flu forecasting is to provide a more-timely and forward-looking tool that health officials can use to target medical interventions, inform earlier public health actions, and allocate resources for communications, disease prevention and control. The potential benefits of flu forecasting are significant.

Since 2013, the Influenza Division at the Centers for Disease Control and Prevention has worked with external researchers to improve the science and usability of influenza forecasts by coordinating seasonal influenza prediction challenges. This work includes defining prediction targets, facilitating data access, establishing evaluation metrics to assess accuracy, and developing forecast visualizations. Starting in the 2017/18 influenza season, CDC will be sponsoring the State FluSight challenge to forecast influenza like illness (ILI) for participating US states.

This beta website houses the weekly state-level influenza activity forecasts provided by the various research teams. It’s important to note that these are not CDC forecasts and that the forecasts on this website are not endorsed by CDC. These forecasts are based on different models, can vary significantly, and may be inaccurate.

Interested in participating in the State FluSight challenge? Please email flucontest@cdc.gov for more information

Submitted forecasts

Use the interactive tool below to explore submitted forecasts for the 2017-18 influenza season. Click throughout the season to examine forecasts received during a given week. To see the most recent forecasts, click the forecast week immediately preceeding the dotted "Today" line.

Peak week and intensity predictions are visualized by the stand-alone dots with confidence intervals, and week-ahead forecasts are visualized as the connected dots with confidence bands. More information on interpreting forecasts can be found in the FAQs.

There is currently a known issue affecting the forecasts for Louisiana and Mississippi. We are working to resolve it as quickly as possible and appreciate your patience.

Forecast Targets

For each week during the season, participants will be asked to provide state-level probabilistic forecasts for the entire influenza season (seasonal targets) and for the next four weeks (four-week ahead targets). The seasonal targets are the peak week and the peak intensity of the 2017-2018 influenza season for each state being forecast. The four-week ahead targets are the percent of outpatient visits experiencing influenza-like illness (ILI) one week, two weeks, three weeks, and four weeks ahead from date of the forecast.

Seasonal Peak Week

Definition The peak week will be defined as the MMWR surveillance week that the weighted ILINet percentage is the highest in a given state for the 2017-2018 influenza season.

Motivation Accurate and timely forecasts for the peak week can be useful for planning and promoting activities to increase influenza vaccination prior to the bulk of influenza illness. For healthcare, pharmacy, and public health authorities, a forecast for the peak week can guide efficient staff and resource allocation.

Seasonal Peak Intensity

Definition The intensity will be defined as the highest numeric value, rounded to one decimal place, that the weighted ILINet percentage reaches during the 2017-2018 influenza season.

Motivation Accurate and timely forecasts for the peak week and intensity of the influenza season can be useful for influenza prevention and control, including the planning and promotion of activities to increase influenza vaccination prior to the bulk of influenza illness. For healthcare, pharmacy, and public health authorities, a forecast for the peak week and intensity can help with appropriate staff and resource allocation since a surge of patients with influenza illness can be expected to seek care and receive treatment in the weeks surrounding the peak.

Short Term Forecasts

Definition One- to four-week ahead forecasts will be defined as the weighted ILINet percentage for the target week, rounded to one decimal place.

Motivation Forecasts capable of providing reliable estimates of influenza activity over the next month are critical because they allow healthcare and public health officials to prepare for and respond to near-term changes in influenza activity and bridge the gap between reported incidence data and long-term seasonal forecasts.

State ILI and laboratory data

Data on the weekly proportion of people seeing their health-care provider for influenza-like illness (ILI) is reported through the ILINet System for the United States as a whole, for each HHS health region, and for most individual US States. These data can be accessed directly from CDC. Alternatively, the R package cdcfluview (available from CRAN or GitHub) can be used to access the data as shown in the following example

# Option 1: Install from CRAN
install.packages("cdcfluview")

# Option 2: Install from GitHub (most up-to-date version)
devtools::install_github("hrbrmstr/cdcfluview")

library(cdcfluview)

# National ILINet data for 1997/98 - 2017/18 seasons
usflu <- get_flu_data(region = "national", data_source = "ilinet", years = 1997:2017)

# HHS Regional ILINet data for 1997/98 - 2017/18 seasons
regionflu <- get_flu_data(region = "HHS", sub_region = 1:10, data_source = "ilinet", years = 1997:2017)

# State ILINet data for 1997/98 - 2017/18 seasons  --  only available via GitHub version
stateflu <- get_flu_data(region = "state", sub_region = "all", data_source = "ilinet", years = 1997:2017)

Please note that while cdcfluview accesses publically available CDC data, it is not produced, maintained, or endorsed by the CDC.

For those states that have not publically released their data, participating teams can access state-level reports of influenza like illness as well as clinical and public health laboratory data through the "State ILI data" tab at left. This tab is only accessible to participating teams - if you would like to participate please email flucontest@cdc.gov for more information!

Additional Data

Teams are welcome to use data sources for model development beyond the provided data - possible additional data sources include but are not limited to:

Forecast Evaluation

All forecasts will be evaluated using the weighted observations pulled from the ILINet system in week 28, and the logarithmic score will be used to measure the accuracy of the probability distribution of a forecast. Logarithmic scores will be averaged across different time periods, the seasonal targets, the four-week ahead targets, and locations to provide both specific and generalized measures of model accuracy. Forecast accuracy will be measured by log score only. Nonetheless, forecasters are requested to continue to submit point predictions, which should aim to minimize the absolute error (AE).

Logarithmic Score

If ;;\mathbf{p};; is the set of probabilities for a given forecast, and ;;\mathbf{p_i};; is the probability assigned to the observed outcome ;;i;;, the logarithmic score is:

$$S(\mathbf{p},i) = \text{ln}(p_i)$$

For peak week, the probability assigned to that correct bin (based on the weighted ILINet value) plus the probability assigned to the preceding and proceeding bins will be summed to determine the probability assigned to the observed outcome. In the case of multiple peak weeks, the probability assigned to the bins containing the peak weeks and the preceding and proceeding bins will be summed. For peak percentage and 4-weeks-ahead forecasts, the probability assigned to the correct bin plus the probability assigned to the five preceding and five proceeding bins will be summed to determine the probability assigned to the observed outcome. For example, if the correct peak ILINet value is 6.5%, the probabilities assigned to all bins ranging from 6.0% to 7.0% will be summed to determine the probability assigned to the observed outcome.

For all targets, if the correct bin is near the first or last bin, the number of bins summed will be reduced accordingly. No bin farther than one bin (peak week) or five bins away (percentage forecasts) from the correct bin will contribute to the score. For example, if the correct ILINet percentage for a given week is 0.3%, probabilities assigned to bins ranging from 0% to 0.8% will be summed. Undefined natural logs (which occur when the probability assigned to the observed outcome was 0) will be assigned a value of -10. Forecasts which are not submitted (e.g. if a week is missed) or that are incomplete (e.g. sum of probabilities greater than 1.1) will also be assigned a value of -10.

Example: A forecast predicts there is a probability of 0.2 (i.e. a 20% chance) that the influenza season for Texas peaks on week 2, a 0.3 probability that it peaks on week 3, and a 0.1 probability that it peaks on week 4 with the other 0.4 (40%) distributed across other weeks according to the forecast. Once the flu season has started, the prediction can be evaluated, and the Texas ILI data show that the flu season peaked on week 3. The probabilities for weeks 2, 3, and 4 would be summed, and the forecast would receive a score of ;;ln(0.6) = -0.51;;. If the season peaked on another week, the score would be calculated on the probability assigned to that week plus the values assigned to the preceding and proceeding week.

References

FluSight Package

The FluSight R package contains functions to help create and format forecasts, read and verify forecast CSVs, and score forecasts. These are the functions that will be used at CDC to verify and score submitted forecasts. Teams are welcome to use these tools to ensure their forecasts fit the required template and score their forecasts prior to receiving official scores from CDC

The package can be downloaded from GitHub.

# Install and load package
devtools::install_github("jarad/FluSight")

library(FluSight)

# Read in state forecast entry CSV
entry <- read_entry("your_csv.csv", challenge = "state_ili")

# Verify entry
verify_entry(entry, challenge = "state_ili")
verify_entry_file("your_csv.csv", challenge = "state_ili")

# Create file of observed truth
truth <- create_truth(fluview = T, year = 2017, challenge = "state_ili")

# Expand observed truth to take into account additional bins - 1 bin for weeks, 5 bins for percentage
exp_truth <- expand_truth(truth, week_expand = 1, percent_expand = 5, challenge = "state_ili")

# Score a weekly entry against the observed truth
exact_scores <- score_entry(entry, truth, challenge = "state_ili")
expand_scores <- score_entry(entry, exp_truth, challenge = "state_ili")
Guidance Documents

Guidance for the 2017-18 State FluSight challenge is available here

An empty copy of the offical submission template is available here.

Frequenty Asked Questions

How do I see the most recently received forecasts?

To see the most recent forecasts on the visualization page, click in the visualization field on the week immediately preceeding the vertical dashed line marked "Today". See the image below for an example:

click_here

How do I view forecasts for a particular state or territory?

To see forecasts for a particular state or territory, use the dropdown menu in the top right corner of the visualization pane.

state_dropdown

How do I view forecasts for the entire United States or a particular HHS Region?

These forecasts are hosted on the "FluSight 2017-18" challenge. Please go to the main EPI page by clicking the "Epidemic Prediction Initiative" logo in the top left corned and select the "FluSight 2017-18" challenge.

What is the "FluSight Avg" forecast?

The FluSight average is an ensemble forecast generated by taking the arithmetic mean of all submitted forecasts. Ensemble forecasts have a record of success in both weather and infectious disease forecasting, and taking the mean of all forecasts reduces the likelihood of basing a decision on a poor individual forecast.

How do I interpret the forecasts shown?

For all of the following explanations, assume that the confidence intervals have been set to 50%. You can choose either 50% or 90% confidence intervals by clicking the corresponding number in the top right corner of the visualizations.

Forecasts for each target are displayed in different sections of the visualizations, as shown here:

ci_choose

Peak week and intensity forecasts are shown by the stand alone dots in the main graph. They represent a model's point forecast for the timing and intensity of the season peak. Mousing over a model's point forecast will pop up a display box with the model name and specific prediction values. The confidence intervals represent the range the model is 50% confident peak week or peak intensity will occur in.

Week-ahead forecasts are shown by the connected points. The dots represent a model's point forecast for the value of the ILINet curve at a given week. Mousing over one of the dots will bring up a confidence band surrounding the point forecast. This band represents the range the model is 50% confident the observed ILINet values will fall in.

Why don't the point forecasts and confidence intervals for peak forecasts always line up?

The visualizations pull data directly from the forecast files submitted by teams. Depending on a team's methodology, point forecasts may be generated in a different way than the underlying probability distribution the confidence intervals are calculated from.

If you have additional questions, please email flucontest@cdc.gov

Participating teams will be able to submit their forecasts here beginning in November 2017.