Dataset Description

To obtain the data, please follow the instructions under this link. After approval of your request, you will be granted access to the Data Download page to download the data.

Dataset Structure

hecktor2025_training/
  ├── Task 1
      ├── CHUM-001
        ├── CHUM-001__CT.nii.gz 
        ├── CHUM-001__PT.nii.gz
        └── CHUM-001.nii.gz # Label file (GTVp=1, GTVn=2)
      ├── CHUM-002
      ├── ...
      └── HECKTOR_2025_Training_Task_1.csv #Clinical data
  ├── Task 2
      ├── CHUM-001
        ├── CHUM-001__CT.nii.gz
        ├── CHUM-001__PT.nii.gz
        ├── CHUM-001__CTPlanning.nii.gz* # Subset only
        └── CHUM-001__RTDOSE.nii.gz* # Subset only
      ├── CHUM-002
      ├── ...
      └── HECKTOR_2025_Training_Task_2.csv # RFS endpoint data
  └── Task 3
      ├── CHUM-001
        ├── CHUM-001__CT.nii.gz
        └── CHUM-001__PT.nii.gz
      ├── CHUM-002
      ├── ...
      └── HECKTOR_2025_Training_Task_1.csv # HPV Status data

*CT planning and radiotherapy dosemaps will be available for a subset of the training dataset only.

Dataset Description

  • Image Data (PET/CT):
  • All tasks include PET and CT scans for each patient, using the naming convention:
    • CenterName_PatientID__Modality.nii.gz
    • __CT.nii.gz — Computed tomography image
    • __PT.nii.gz — Positron emission tomography image
  • Segmentations (Task 1 only):
  • Each patient has a single label file: PatientID.nii.gz
    • Label 1 = Primary tumor (GTVp)
    • Label 2 = Lymph nodes (GTVn)
  • Radiotherapy Dose Data (Task 2 only):
  • For a subset of patients:
    • __CTPlanning.nii.gz — CT planning scan
    • __RTDOSE.nii.gz — RT dose map
  • Clinical Information:
  • Provided in HECKTOR_2025_Training_Task_#.csv, includes:
  • Center, gender, age, tobacco and alcohol use, performance status, treatment (radiotherapy only or chemoradiotherapy), M-stage (metastasis)
  • Relapse indicator and RFS value (used as the target for Task 2)
  • HPV status (used as the target for Task 3)
  • Some entries may contain missing data, but the 2025 edition includes significant updates.

The data originates from FDG-PET and low-dose non-contrast-enhanced CT images (acquired with combined PET/CT scanners) of the H&N region.

Data were collected from 10 centers :

Center


Acronym

PET/CT scanner


HECKTOR 2022

Hôpital général juif, Montréal, CA


 HGJ

 Discovery ST, GE Healthcare


 Yes

Centre hospitalier universitaire de Sherbooke, Sherbrooke, CA


 CHUS

 GeminiGXL 16, Philips


 Yes

Hôpital Maisonneuve-Rosemont, Montréal, CA


 HMR

 Discovery STE, GE Healthcare


 Yes

Centre hospitalier de l’Université de Montréal, Montréal, CA


 CHUM

 Discovery STE, GE Healthcare


 Yes

Centre Hospitalier Universitaire de Poitiers, FR


 CHUP

 Biograph mCT 40 ToF, Siemens


 Yes

MD Anderson Cancer Center, Houston, Texas, USA


 MDA

 Discovery HR, Discovery RX, Discovery ST, Discovery STE (GE Healthcare)


 Yes

UniversitätsSpital Zürich, CH


 USZ

 Discovery HR, Discovery RX, Discovery STE, Discovery LS, Discovery 690 (GE Healthcare)


 Yes

Centre Henri Becquerel, Rouen, FR


 CHB

 GE710, GE Healthcare


 Yes

Centre Hospitalier Universitaire de Brest, FR


 CHUB

 Philips GEMINI, Siemens Biograph, Siemens Biograph Vision


 No

Centre Hospitalier Universitaire de Nantes, FR 


 CHUN

 Siemens mCT 64 vision


 No

Training and testing cases represent one 3D FDG-PET volume registered with a 3D CT volume of the head and neck region. For Task 1, contours with the annotated ground truth lesions (only available for training cases to the participating teams) are provided. The labels have three values: background with the value 0, primary Gross Tumor Volumes (GTVp) with the value 1, and nodal Gross Tumor Volumes (GTVn) with the value 2 (in case of several lymph nodes, they are considered all with the same label). For Task 2, the cases also include the patient outcome information (only available for training cases to the participating teams) of RFS (time-to-event in days and censoring), as well as the dosimetry CT and the corresponding radiotherapy dosemap for some of the patients.

The total number of cases is more than 1200 from at least 11 centers. The total number of training cases is approximately 700 from 9 different centers. Part of the test cases of the 2022 challenge were moved to the 2025 training set. The total number of test cases is approximately 400 from at least 3 centers, consisting of new and previously unseen cases. The test set is estimated to consist of 80% HPV-positive cases and 20% HPV-negative cases.

Training and test cohorts are representative of the distribution of the real-world population of patients accepted for initial staging of oropharyngeal cancer.

The preprocessing of PET/CT images involves (for both the training and test cases): (i) computation of the Standardized Uptake Value (SUV) for the PET images and (ii) conversion of the DICOM file format to NIfTI format.

Validation and Testing Process

For the HECKTOR 2025 challenge, we have implemented a new evaluation approach. No test data will be shared directly with participants. Instead, evaluation will be conducted exclusively through Docker container submissions on the Grand Challenge platform.