Accuracy

European Radiology

12 August 2021

Authors

Kerschke L, Weigel S, Rodriguez-Ruiz A, Karssemeijer N, Heindel W

Using deep learning to assist readers during the arbitration process: a lesion-based retrospective evaluation of breast cancer screening performance

Objectives

This study evaluated the performance of Transpara to discriminate recalled benign from recalled malignant mammographic screening abnormalities, aiming to improve screening performance.

Methods

A total of 2257 full-field digital mammography screening examinations of women aged 50-69 years were included in this retrospective study, obtained between 2011–2013. These women were all recalled for further assessment, of which were 295 malignant and 2289 benign lesions after independent double-reading with arbitration.
Transpara processed all cases, representing the likelihood of breast cancer with a score (0-95) for every recalled case. The sensitivity on the lesion level and the proportion of women without false-positive ratings (non-FPR) with Transpara were estimated as a function of the classification cutoff. These were compared to the performance of human readers.

Results

Using a cutoff of 1, Transpara decreased the proportion of women with false-positive ratings from 89.9% to 62.0%. The non-FPR was significantly improved from 11.1% vs. 38.0%, preventing 30.1% of reader-induced false-positive recalls. Simultaneously, sensitivity was reduced from 96.7% to 91.1% compared to human reading. The positive predictive value of recall (PPV-1) increased from 12.8% to 16.5%.
In women with mass-related lesions (n = 900), the non- FPR was 14.2% for human reading compared to 36.7% for Transpara, at a sensitivity of 98.5% vs. 97.1%.

Conclusion

The application of Transpara during consensus conference might especially help readers to reduce false-positive recalls of masses at the expense of a small sensitivity reduction. Prospective studies are needed to further evaluate the screening benefit of AI in practice.


You might also like