Dataset Description

To obtain the data, please follow the instructions under this link. After approval of your request, you will be granted access to the Data Download page to download the data.

Dataset Structure

hecktor2025_training/
  ├── imagesTr
      ├── CHUM-001__CT.nii.gz
      ├── CHUM-001__PT.nii.gz
      ├── CHUM-001__dosimetry_CT.nii.gz*
      ├── CHUM-001__radiotherapy_dosemaps.nii.gz*
      └── ...
  ├── labelsTr
      ├── CHUM_001.nii.gz
      └── ...
  ├── hecktor2025_clinical_info_training.csv
  └── hecktor2025_endpoint_training.csv

*Radiotherapy planning dose map and Dosimetry CT will be available for a subset of the dataset only.

All the PET/CT images are gathered inside the imagesTr folder. The name convention is CenterName_PatientID__Modality.nii.gz. The primary tumor (GTVp) and lymph nodes (GTVn) segmentations are inside the labelsTr folder and are contained within one .nii.gz file per patient. The code label 1 is attributed to the GTVp and the label 2 for GTVn. The new folder doseTr contains the dose maps and associated dosimetry CT for a subset of patients (approximately 650 cases). These can be used for Task 2 (RFS prediction).

The clinical information for each patient is contained in the hecktor2025_clinical_info_training.csv, including center, gender, age, weight, tobacco and alcohol consumption, performance status (Zubrod), HPV status, treatment (surgery and/or chemotherapy in addition to the radiotherapy that all patients underwent). Note that some information may be missing for some patients, although an effort has been made to update and complete this information as much as possible for the 2025 edition. The survival events and times between the end of radiotherapy and the events or last follow-up (in days) are provided in hecktor2025_patient_endpoint_training.csv. For Task 3, the HPV status is included in the clinical information file.

Validation and Testing Process

For the HECKTOR 2025 challenge, we have implemented a new evaluation approach. No test data will be shared directly with participants. Instead, evaluation will be conducted exclusively through Docker container submissions on the Grand Challenge platform.

Dataset Description

Patients with histologically proven oropharyngeal H&N cancer who underwent radiotherapy and/or chemotherapy treatment planning were considered.

The data originates from FDG-PET and low-dose non-contrast-enhanced CT images (acquired with combined PET/CT scanners) of the H&N region.

Data were collected from 13 centers :

Center


Acronym

PET/CT scanner


HECKTOR 2022

Hôpital général juif, Montréal, CA


 HGJ

 Discovery ST, GE Healthcare


 Yes

Centre hospitalier universitaire de Sherbooke, Sherbrooke, CA


 CHUS

 GeminiGXL 16, Philips


 Yes

Hôpital Maisonneuve-Rosemont, Montréal, CA


 HMR

 Discovery STE, GE Healthcare


 Yes

Centre hospitalier de l’Université de Montréal, Montréal, CA


 CHUM

 Discovery STE, GE Healthcare


 Yes

Centre Hospitalier Universitaire Vaudois, CH


 CHUV

 Discovery D690 TOF, GE Healthcare


 Yes

Centre Hospitalier Universitaire de Poitiers, FR


 CHUP

 Biograph mCT 40 ToF, Siemens


 Yes

MD Anderson Cancer Center, Houston, Texas, USA


 MDA

 Discovery HR, Discovery RX, Discovery ST, Discovery STE (GE Healthcare)


 Yes

UniversitätsSpital Zürich, CH


 USZ

 Discovery HR, Discovery RX, Discovery STE, Discovery LS, Discovery 690 (GE Healthcare)


 Yes

Centre Henri Becquerel, Rouen, FR


 CHB

 GE710, GE Healthcare


 Yes

Hôpitaux Universitaires de Genève, CH


 HUG

 Siemens Biograph 64 True Point scanner


 No

Centre Hospitalier Universitaire de Brest, FR


 CHUB

 Philips GEMINI, Siemens Biograph, Siemens Biograph Vision


 No

Centre Hospitalier Universitaire de Nantes, FR 


 CHUN

 Siemens mCT 64 vision


 No

Groupe d'Oncologie Radiothérapie Tête Et Cou, FR 


 GORTEC

 Multiple hybrid PET/CT scanner devices


 No

The information on image data includes clinical center, scanner information, DICOM meta-data including acquisition parameters and reconstruction algorithms. For Task 2, additional meta-data information for the dosimetry CT and dosemaps is provided. The patient information includes center, age, gender, tobacco and alcohol consumption, performance status, HPV status, treatment (radiotherapy only or additional chemotherapy and/or surgery), and M stage. T and N stage will not be provided as it informs on lymph nodes status which is part of the goal of Task 1. HPV status will be provided for the training set but not the testing set (since it will be the ground-truth of Task 3). There may be missing values for some patients, although an effort has been made to update this information as much as possible. Training and testing cases represent one 3D FDG-PET volume registered with a 3D CT volume of the head and neck region. For Task 1, contours with the annotated ground truth lesions (only available for training cases to the participating teams) are provided. The labels have three values: background with the value 0, primary Gross Tumor Volumes (GTVp) with the value 1, and nodal Gross Tumor Volumes (GTVn) with the value 2 (in case of several lymph nodes, they are considered all with the same label). For Task 2, the cases also include the patient outcome information (only available for training cases to the participating teams) of RFS (time-to-event in days and censoring), as well as the dosimetry CT and the corresponding radiotherapy dosemap for some of the patients.

The total number of cases is more than 1500 from at least 13 centers. The total number of training cases is approximately 883 from 9 different centers. The test cases of the 2022 challenge were moved to the 2025 training set. The total number of test cases is approximately 400 from at least 3 centers, consisting of new and previously unseen cases. The test set is estimated to consist of 80% HPV-positive cases and 20% HPV-negative cases.

Training and test cohorts are representative of the distribution of the real-world population of patients accepted for initial staging of oropharyngeal cancer.

The preprocessing of PET/CT images involves (for both the training and test cases): (i) computation of the Standardized Uptake Value (SUV) for the PET images and (ii) conversion of the DICOM file format to NIfTI format.