Accessing Hub Data

Accessing hub data from GitHub

All model-output, target, and configuration files for this hub are hosted on GitHub. You can access the data directly from the repository at dailypartita/China-COVID-19-Forecast-Hub.

GitHub serves as the primary interface for operating the hub and collecting forecasts from modelers. You can access hub data by cloning the repository or downloading files directly from GitHub.

Data Access Methods

The sections below provide examples for accessing hub data depending on your goals and preferred tools:

Access Method Description
Git/GitHub Clone the repository for full local access
GitHub Raw Files Direct HTTP access to individual files
GitHub API Programmatic access to repository contents

Cloning the Repository

To get a complete local copy of all hub data:

git clone https://github.com/dailypartita/China-COVID-19-Forecast-Hub.git
cd China-COVID-19-Forecast-Hub

This gives you access to: - model-output/: All model forecasts - target-data/: Observed data (time-series.csv) - hub-config/: Hub configuration files - model-metadata/: Model metadata files

Repository Structure

China-COVID-19-Forecast-Hub/
├── model-output/          # Model forecasts by team
│   ├── GZNL-test_001/
│   ├── GZNL-test_002/
│   ├── GZNL-test_003/
│   └── GZNL-test_004/
├── target-data/           # Observed/target data  
│   └── time-series.csv
├── hub-config/           # Configuration files
├── model-metadata/       # Model descriptions
└── README.md

Accessing Individual Files

You can directly download individual files using GitHub’s raw file URLs:

# R example: Load target data
library(readr)
target_data <- read_csv("https://raw.githubusercontent.com/dailypartita/China-COVID-19-Forecast-Hub/main/target-data/time-series.csv")

# Load a specific model's forecast
model_data <- read_csv("https://raw.githubusercontent.com/dailypartita/China-COVID-19-Forecast-Hub/main/model-output/GZNL-test_001/2025-08-28-GZNL-test_001.csv")
# Python example: Load target data
import pandas as pd
target_data = pd.read_csv("https://raw.githubusercontent.com/dailypartita/China-COVID-19-Forecast-Hub/main/target-data/time-series.csv")

# Load a specific model's forecast  
model_data = pd.read_csv("https://raw.githubusercontent.com/dailypartita/China-COVID-19-Forecast-Hub/main/model-output/GZNL-test_001/2025-08-28-GZNL-test_001.csv")

Programmatic Access

Use the GitHub API to programmatically explore repository contents:

# R example using httr
library(httr)
library(jsonlite)

# List all model teams
response <- GET("https://api.github.com/repos/dailypartita/China-COVID-19-Forecast-Hub/contents/model-output")
teams <- fromJSON(content(response, "text"))$name
print(teams)
# Python example using requests
import requests

# List all model teams
response = requests.get("https://api.github.com/repos/dailypartita/China-COVID-19-Forecast-Hub/contents/model-output")
teams = [item['name'] for item in response.json()]
print(teams)

Batch Download Script

#!/bin/bash
# Download all model outputs for a specific date
DATE="2025-08-28"
mkdir -p model-outputs-${DATE}

for team in GZNL-test_001 GZNL-test_002 GZNL-test_003 GZNL-test_004; do
    wget "https://raw.githubusercontent.com/dailypartita/China-COVID-19-Forecast-Hub/main/model-output/${team}/${DATE}-${team}.csv" \
         -O "model-outputs-${DATE}/${team}.csv"
done

Data Format

All model output files in this hub are stored in CSV format and follow the Hubverse data format standards.

Target Data

The target data (target-data/time-series.csv) contains observed SARS-CoV-2 positivity rates among department influenza-like illness (ILI) cases, as reported in China CDC’s weekly National Sentinel Surveillance of Acute Respiratory Infectious Diseases.

Model Output Data

Each model output file contains quantile forecasts with the following columns: - reference_date: The date the forecast was made - target: The forecasting target (wk inc covid prop ili) - target_end_date: The date for which the prediction is made - location: Geographic location code (CN for China) - output_type: Type of prediction (quantile) - output_type_id: Quantile level (0.01, 0.025, 0.05, …, 0.975, 0.99) - value: The predicted value

For more details on the data format and hub structure, see the repository README.