28 November 2020
Sara Romero Martín et al
Using autonomous AI to reduce the workload of breast cancer screening with breast tomosynthesis: a retrospective validation
Sara Romero Martín, Jose Luis Raya Povedano, Esperanza Elías Cabot, Albert Gubern-Merida, Alejandro Rodríguez-Ruiz, Marina Álvarez Benito
To determine the impact of using an artificial intelligence (AI) system to autonomously read a large fraction of breast tomosynthesis (DBT) screening exams in terms of workload reduction, recall rate and sensitivity of screening.
Materials and Methods:
A consecutive cohort of DBT screening exams was retrospectively collected from a previous trial study (Cordoba Tomosynthesis Trial) comparing DBT to digital mammography (DM). Each DBT screening exam was single read with access to the synthetic mammogram. The cohort included 12470 examinations with 87 cancer (77 detected at screening during the trial with either DBT or DM alone, 10 interval cancers). All the DBT exams were processed by an AI system (Transpara, ScreenPoint Medical), which categorizes them on a scale 1-10 representing the likelihood of containing visible cancer (approximately 10% of screening volume is placed on each category). The hypothesis was that two groups of DBT exams could be created based on AI: exams with scores 1-7 (the least suspicious) would be excluded from human reading and automatically labeled as normal. Exams with scores 8-10 (more suspicious) would be single-read. Additionally, non-recalled exams by the single-reading process but with very high AI scores would also be automatically recalled.
Sensitivity, recall rate and workload (number of necessary human readings) were compared between the original reading and the autonomous AI-based scenario using a McNemar test.
During the original reading with DBT, 362 women were recalled (recall rate 2.90%) and 67 cancers were detected (sensitivity 77.0%). Using the autonomous AI-based scenario, 368 women would have been recalled (recall rate 2.95%, 95% CI = 2.66-3.26%, P=0.81), 69 cancers would have been detected (sensitivity 79.3%, 95% CI = 69.3-87.3%, P=0.62), and there would have been a workload reduction of 70.7% (only 3653/12470 screening DBT exams would have been read).
Using AI to autonomously label a large fraction of DBT screening exams as normal without the involvement of radiologists could reduce screening workload by 70%, with minimal impact in recall rate and sensitivity.
Screening programs with DBT can be labor-intensive because of the higher reading times of DBT in comparison with 2D mammography. AI-based strategies could aim to reduce workload without decreasing cancer detection rates.