Comparing different CT, PET and MRI multi-modality image combinations for deep learning-based head and neck tumor segmentation

Authors Ren J, Eriksen JG, Nijkamp J, Korreman SS

Source Acta Oncol . 2021 Nov;60(11):1399-1406 Publicationdate 15 Jul 2021

Abstract

Background: Manual delineation of gross tumor volume (GTV) is essential for radiotherapy treatment planning, but it is time-consuming and suffers inter-observer variability (IOV). In clinics, CT, PET, and MRI are used to inform delineation accuracy due to their different complementary characteristics. This study aimed to investigate deep learning to assist GTV delineation in head and neck squamous cell carcinoma (HNSCC) by comparing various modality combinations.

Materials and methods: This retrospective study had 153 patients with multiple sites of HNSCC including their planning CT, PET, and MRI (T1-weighted and T2-weighted). Clinical delineations of gross tumor volume (GTV-T) and involved lymph nodes (GTV-N) were collected as the ground truth. The dataset was randomly divided into 92 patients for training, 31 for validation, and 30 for testing. We applied a residual 3 D UNet as the deep learning architecture. We independently trained the UNet with four different modality combinations (CT-PET-MRI, CT-MRI, CT-PET, and PET-MRI). Additionally, analogical to post-processing, an average fusion of three bi-modality combinations (CT-PET, CT-MRI, and PET-MRI) was produced as an ensemble. Segmentation accuracy was evaluated on the test set, using Dice similarity coefficient (Dice), Hausdorff Distance 95 percentile (HD95), and Mean Surface Distance (MSD).

Results: All imaging combinations including PET provided similar average scores in range of Dice: 0.72-0.74, HD95: 8.8-9.5 mm, MSD: 2.6-2.8 mm. Only CT-MRI had a lower score with Dice: 0.58, HD95: 12.9 mm, MSD: 3.7 mm. The average of three bi-modality combinations reached Dice: 0.74, HD95: 7.9 mm, MSD: 2.4 mm.

Conclusion: Multimodal deep learning-based auto segmentation of HNSCC GTV was demonstrated and inclusion of the PET image was shown to be crucial. Training on combined MRI, PET, and CT data provided limited improvements over CT-PET and PET-MRI. However, when combining three bimodal trained networks into an ensemble, promising improvements were shown.

Keywords: CNN; Deep learning; GTV; UNet; auto-segmentation; head and neck cancer.