Dataset - HEad and neCK TumOR Lesion Segmentation, Diagnosis and Prognosis

HEad and neCK TumOR Lesion Segmentation, Diagnosis and Prognosis Banner

Dataset Description¶

To obtain the data, please follow the instructions under this link. After approval of your request, you will be granted access to the Data Download page to download the data.

Dataset Structure¶

hecktor2025_training/
  ├── Task 1
      ├── CHUM-001
        ├── CHUM-001__CT.nii.gz 
        ├── CHUM-001__PT.nii.gz
        └── CHUM-001.nii.gz # Label file (GTVp=1, GTVn=2)
      ├── CHUM-002
      ├── ...
      └── HECKTOR_2025_Training_Task_1.csv #Clinical data
  ├── Task 2
      ├── CHUM-001
        ├── CHUM-001__CT.nii.gz
        ├── CHUM-001__PT.nii.gz
        ├── CHUM-001__CTPlanning.nii.gz* # Subset only
        └── CHUM-001__RTDOSE.nii.gz* # Subset only
      ├── CHUM-002
      ├── ...
      └── HECKTOR_2025_Training_Task_2.csv # RFS endpoint data
  └── Task 3
      ├── CHUM-001
        ├── CHUM-001__CT.nii.gz
        └── CHUM-001__PT.nii.gz
      ├── CHUM-002
      ├── ...
      └── HECKTOR_2025_Training_Task_1.csv # HPV Status data

CT planning and radiotherapy dosemaps will be available for a subset of the training dataset only.

The available modalities and EHR features for each task container will mimic exactly what is available in the training data.

Dataset Description¶

Image Data (PET/CT):
All tasks include PET and CT scans for each patient, using the naming convention:
- CenterName_PatientID__Modality.nii.gz
- __CT.nii.gz — Computed tomography image
- __PT.nii.gz — Positron emission tomography image
Segmentations (Task 1 only):
Each patient has a single label file: PatientID.nii.gz
- Label 1 = Primary tumor (GTVp)
- Label 2 = Lymph nodes (GTVn)
Radiotherapy Dose Data (Task 2 only):
For a subset of patients:
- __CTPlanning.nii.gz — CT planning scan
- __RTDOSE.nii.gz — RT dose map
Clinical Information:
Provided in HECKTOR_2025_Training_Task_#.csv, includes:
Center, gender, age, tobacco and alcohol use, performance status, treatment (radiotherapy only or chemoradiotherapy), M-stage (metastasis)
Relapse indicator and RFS value (used as the target for Task 2)
HPV status (used as the target for Task 3)
Some entries may contain missing data, but the 2025 edition includes significant updates.

The data originates from FDG-PET and low-dose non-contrast-enhanced CT images (acquired with combined PET/CT scanners) of the H&N region.

Data were collected from 10 centers :

Center	Acronym	PET/CT scanner	HECKTOR 2022
Hôpital général juif, Montréal, CA	HGJ	Discovery ST, GE Healthcare	Yes
Centre hospitalier universitaire de Sherbooke, Sherbrooke, CA	CHUS	GeminiGXL 16, Philips	Yes
Hôpital Maisonneuve-Rosemont, Montréal, CA	HMR	Discovery STE, GE Healthcare	Yes
Centre hospitalier de l’Université de Montréal, Montréal, CA	CHUM	Discovery STE, GE Healthcare	Yes
Centre Hospitalier Universitaire de Poitiers, FR	CHUP	Biograph mCT 40 ToF, Siemens	Yes
MD Anderson Cancer Center, Houston, Texas, USA	MDA	Discovery HR, Discovery RX, Discovery ST, Discovery STE (GE Healthcare)	Yes
UniversitätsSpital Zürich, CH	USZ	Discovery HR, Discovery RX, Discovery STE, Discovery LS, Discovery 690 (GE Healthcare)	Yes
Centre Henri Becquerel, Rouen, FR	CHB	GE710, GE Healthcare	Yes
Centre Hospitalier Universitaire de Brest, FR	CHUB	Philips GEMINI, Siemens Biograph, Siemens Biograph Vision	No
Centre Hospitalier Universitaire de Nantes, FR	CHUN	Siemens mCT 64 vision	No

Training and testing cases represent one 3D FDG-PET volume registered with a 3D CT volume of the head and neck region. For Task 1, contours with the annotated ground truth lesions (only available for training cases to the participating teams) are provided. The labels have three values: background with the value 0, primary Gross Tumor Volumes (GTVp) with the value 1, and nodal Gross Tumor Volumes (GTVn) with the value 2 (in case of several lymph nodes, they are considered all with the same label). For Task 2, the cases also include the patient outcome information (only available for training cases to the participating teams) of RFS (time-to-event in days and censoring), as well as the dosimetry CT and the corresponding radiotherapy dosemap for some of the patients.

The total number of cases is more than 1200 from 11 centers. The total number of training cases is approximately 700 from 8 different centers. Part of the test cases of the 2022 challenge were moved to the 2025 training set. The total number of test cases is approximately 450 from 3 centers, consisting of new and previously unseen cases. The test set is estimated to consist of 80% HPV-positive cases and 20% HPV-negative cases.

Training and test cohorts are representative of the distribution of the real-world population of patients accepted for initial staging of oropharyngeal cancer.

The preprocessing of PET/CT images involves (for both the training and test cases): (i) computation of the Standardized Uptake Value (SUV) for the PET images and (ii) conversion of the DICOM file format to NIfTI format.

Validation and Testing Process¶

For the HECKTOR 2025 challenge, we have implemented a new evaluation approach. No test data will be shared directly with participants. Instead, evaluation will be conducted exclusively through Docker container submissions on the Grand Challenge platform.