1 December 2021
Romero-Martín S, Raya-Povedano JL, Elías-Cabot E, Gubern-Mérida A, Rodríguez-Ruíz A, Álvarez-Benito M
Can artificial intelligence (AI) completely replace human reader in mammography screening program? A retrospective evaluation with digital mammography (DM) and digital breast tomosynthesis (DBT)
AIM AND OBJECTIVE
To investigate whether the use of AI alone as a screening tool could achieve similar sensitivity with an acceptable recall rate, in comparison to different screening scenarios with radiologists: single reading of DM and DBT and double reading of DM and DBT.
MATERIALS AND METHOD
A consecutive cohort of 15999 DM/DBT screening exams (113 cancers, including 15 interval cancers) was retrospectively collected from the Tomosynthesis Cordoba Screening Trial, comparing DM and DBT in a paired cohort. Each screening exam was independently double read by radiologists without consensus.
All exams were processed by an AI system (Transpara, ScreenPoint Medical). The most suspicious findings detected by Transpara were marked with a score between 1 and 95, indicating the increasing likelihood that a visible cancer is present in the mammogram. Only cancer lesions correctly localized by Transpara were considered as true positives. The stand-alone performance of Transpara was independently computed for DM and DBT exams in terms of area under the receiver operating characteristic curve (AUC, ROC, 95% confidence intervals). Further, each original screening setting was compared to the stand-alone performance of Transpara regarding recall rate and sensitivity at different cutoff points applying a paired data McNemar test.
Transpara achieved an AUC of 0.927 (0.894-0.957). Single human reading had a recall rate of 3.11% and sensitivity of 58.41%, compared to 1.36% and 61% with AI as autonomous reader. Difference in recall rate was -1.75%. Double human reading had a recall rate of 5.05% and sensitivity of 67.26%, compared to 2.54% and 68.14% with AI, leading to a difference in recall rate of -2.6%.
Transpara achieved an AUC of 0.942 (0.914-0.965). Single human reading had a recall rate of 3.01% and sensitivity of 76.99%, compared to 9.21% and 77.88% with AI as autonomous reader, leading to a difference in recall rate of +6.2%. Double human reading had a recall rate of 4.42% and sensitivity of 81.42%, compared to 16.69% and 82.30% with AI, revealing a difference in recall rate of +12%.
Transpara could be used alone in screening programs with DM but with DBT it would be necessary to increase the recall rates to achieve similar sensitivity.