12 August 2021
Kerschke L, Weigel S, Rodriguez-Ruiz A, Karssemeijer N, Heindel W
Using deep learning to assist readers during the arbitration process: a lesion-based retrospective evaluation of breast cancer screening performance
This study evaluated the performance of Transpara to discriminate recalled benign from recalled malignant mammographic screening abnormalities, aiming to improve screening performance.
A total of 2257 full-field digital mammography screening examinations of women aged 50-69 years were included in this retrospective study, obtained between 2011–2013. These women were all recalled for further assessment, of which were 295 malignant and 2289 benign lesions after independent double-reading with arbitration.
Transpara processed all cases, representing the likelihood of breast cancer with a score (0-95) for every recalled case. The sensitivity on the lesion level and the proportion of women without false-positive ratings (non-FPR) with Transpara were estimated as a function of the classification cutoff. These were compared to the performance of human readers.
Using a cutoff of 1, Transpara decreased the proportion of women with false-positive ratings from 89.9% to 62.0%. The non-FPR was significantly improved from 11.1% vs. 38.0%, preventing 30.1% of reader-induced false-positive recalls. Simultaneously, sensitivity was reduced from 96.7% to 91.1% compared to human reading. The positive predictive value of recall (PPV-1) increased from 12.8% to 16.5%.
In women with mass-related lesions (n = 900), the non- FPR was 14.2% for human reading compared to 36.7% for Transpara, at a sensitivity of 98.5% vs. 97.1%.
The application of Transpara during consensus conference might especially help readers to reduce false-positive recalls of masses at the expense of a small sensitivity reduction. Prospective studies are needed to further evaluate the screening benefit of AI in practice.