Dataset Description¶
To obtain the data, please follow the instructions under this link. After approval of your request, you will be granted access to the Data Download page to download the data.
Dataset Structure¶
hecktor2025_training/ ├── Task 1 ├── CHUM-001 ├── CHUM-001__CT.nii.gz ├── CHUM-001__PT.nii.gz └── CHUM-001.nii.gz # Label file (GTVp=1, GTVn=2) ├── CHUM-002 ├── ... └── HECKTOR_2025_Training_Task_1.csv #Clinical data ├── Task 2 ├── CHUM-001 ├── CHUM-001__CT.nii.gz ├── CHUM-001__PT.nii.gz ├── CHUM-001__CTPlanning.nii.gz* # Subset only └── CHUM-001__RTDOSE.nii.gz* # Subset only ├── CHUM-002 ├── ... └── HECKTOR_2025_Training_Task_2.csv # RFS endpoint data └── Task 3 ├── CHUM-001 ├── CHUM-001__CT.nii.gz └── CHUM-001__PT.nii.gz ├── CHUM-002 ├── ... └── HECKTOR_2025_Training_Task_1.csv # HPV Status data
CT planning and radiotherapy dosemaps will be available for a subset of the training dataset only.
The available modalities and EHR features for each task container will mimic exactly what is available in the training data.
Dataset Description¶
- Image Data (PET/CT):
- All tasks include PET and CT scans for each patient, using the naming convention:- CenterName_PatientID__Modality.nii.gz
- __CT.nii.gz — Computed tomography image
- __PT.nii.gz — Positron emission tomography image
 
- Segmentations (Task 1 only):
- Each patient has a single label file: PatientID.nii.gz- Label 1 = Primary tumor (GTVp)
- Label 2 = Lymph nodes (GTVn)
 
- Radiotherapy Dose Data (Task 2 only):
- For a subset of patients:- __CTPlanning.nii.gz — CT planning scan
- __RTDOSE.nii.gz — RT dose map
 
- Clinical Information:
- Provided in HECKTOR_2025_Training_Task_#.csv, includes:
- Center, gender, age, tobacco and alcohol use, performance status, treatment (radiotherapy only or chemoradiotherapy), M-stage (metastasis)
- Relapse indicator and RFS value (used as the target for Task 2)
- HPV status (used as the target for Task 3)
- Some entries may contain missing data, but the 2025 edition includes significant updates.
The data originates from FDG-PET and low-dose non-contrast-enhanced CT images (acquired with combined PET/CT scanners) of the H&N region.
Data were collected from 10 centers :
| Center | Acronym | PET/CT scanner | HECKTOR 2022 | ||
| Hôpital général juif, Montréal, CA | HGJ | Discovery ST, GE Healthcare | Yes | ||
| Centre hospitalier universitaire de Sherbooke, Sherbrooke, CA | CHUS | GeminiGXL 16, Philips | Yes | ||
| Hôpital Maisonneuve-Rosemont, Montréal, CA | HMR | Discovery STE, GE Healthcare | Yes | ||
| Centre hospitalier de l’Université de Montréal, Montréal, CA | CHUM | Discovery STE, GE Healthcare | Yes | ||
| Centre Hospitalier Universitaire de Poitiers, FR | CHUP | Biograph mCT 40 ToF, Siemens | Yes | ||
| MD Anderson Cancer Center, Houston, Texas, USA | MDA | Discovery HR, Discovery RX, Discovery ST, Discovery STE (GE Healthcare) | Yes | ||
| UniversitätsSpital Zürich, CH | USZ | Discovery HR, Discovery RX, Discovery STE, Discovery LS, Discovery 690 (GE Healthcare) | Yes | ||
| Centre Henri Becquerel, Rouen, FR | CHB | GE710, GE Healthcare | Yes | ||
| Centre Hospitalier Universitaire de Brest, FR | CHUB | Philips GEMINI, Siemens Biograph, Siemens Biograph Vision | No | ||
| Centre Hospitalier Universitaire de Nantes, FR | CHUN | Siemens mCT 64 vision | No | 
Training and testing cases represent one 3D FDG-PET volume registered with a 3D CT volume of the head and neck region. For Task 1, contours with the annotated ground truth lesions (only available for training cases to the participating teams) are provided. The labels have three values: background with the value 0, primary Gross Tumor Volumes (GTVp) with the value 1, and nodal Gross Tumor Volumes (GTVn) with the value 2 (in case of several lymph nodes, they are considered all with the same label). For Task 2, the cases also include the patient outcome information (only available for training cases to the participating teams) of RFS (time-to-event in days and censoring), as well as the dosimetry CT and the corresponding radiotherapy dosemap for some of the patients.
The total number of cases is more than 1200 from 11 centers. The total number of training cases is approximately 700 from 8 different centers. Part of the test cases of the 2022 challenge were moved to the 2025 training set. The total number of test cases is approximately 450 from 3 centers, consisting of new and previously unseen cases. The test set is estimated to consist of 80% HPV-positive cases and 20% HPV-negative cases.
Training and test cohorts are representative of the distribution of the real-world population of patients accepted for initial staging of oropharyngeal cancer.
The preprocessing of PET/CT images involves (for both the training and test cases): (i) computation of the Standardized Uptake Value (SUV) for the PET images and (ii) conversion of the DICOM file format to NIfTI format.
Validation and Testing Process¶
For the HECKTOR 2025 challenge, we have implemented a new evaluation approach. No test data will be shared directly with participants. Instead, evaluation will be conducted exclusively through Docker container submissions on the Grand Challenge platform.
