Larynx cancer survival model developed through open-source federated learning

Authors Hansen CR, Price G, Field M, Sarup N, Zukauskaite R, Johansen J, Eriksen JG, Aly F, McPartlin A, Holloway L, Thwaites D, Brink C
Source Radiother Oncol. 2022 Nov;176:179-186 Publicationdate 05 Oct 2022
Abstract

Abstract

Introduction: Federated learning has the potential to perfrom analysis on decentralised data; however, there are some obstacles to survival analyses as there is a risk of data leakage. This study demonstrates how to perform a stratified Cox regression survival analysis specifically designed to avoid data leakage using federated learning on larynx cancer patients from centres in three different countries.

Methods: Data were obtained from 1821 larynx cancer patients treated with radiotherapy in three centres. Tumour volume was available for all 786 of the included patients. Parameter selection among eleven clinical and radiotherapy parameters were performed using best subset selection and cross-validation through the federated learning system, AusCAT. After parameter selection, β regression coefficients were estimated using bootstrap. Calibration plots were generated at 2 and 5-years survival, and inner and outer risk groups' Kaplan-Meier curves were compared to the Cox model prediction.

Results: The best performing Cox model included log(GTV), performance status, age, smoking, haemoglobin and N-classification; however, the simplest model with similar statistical prediction power included log(GTV) and performance status only. The Harrell C-indices for the simplest model were for Odense, Christie and Liverpool 0.75[0.71-0.78], 0.65[0.59-0.71], and 0.69[0.59-0.77], respectively. The values are slightly higher for the full model with C-index 0.77[0.74-0.80], 0.67[0.62-0.73] and 0.71[0.61-0.80], respectively. Smoking during treatment has the same hazard as a ten-years older nonsmoking patient.

Conclusion: Without any patient-specific data leaving the hospitals, a stratified Cox regression model based on data from centres in three countries was developed without data leakage risks. The overall survival model is primarily driven by tumour volume and performance status.