Performance benchmarking


3 October 2019


Alejandro Rodriguez-Ruiz, Kristina Lång, Albert Gubern-Merida et al

Using AI as a pre-screening tool to replace double with single reading for likely normal mammography cases: a simulation study on the impact on sensitivity, specificity and workload

Alejandro Rodriguez-Ruiz, Kristina Lång, Albert Gubern-Merida, Mireille Broeders, Gisella Gennaro, Paola Clauser, Thomas Helbich, Margarita Chevalier, Tao Tan, Thomas Mertelmeier, Matthew G. Wallis, Ingvar Andersson, Sophia Zackrisson, Ritse M. Mann, Ioannis Sechopoulos

To analyze the impact of replacing double reading with single reading of the screening mammograms labeled as most likely normal by a deep learning-based artificial intelligence (AI) system.

A multi-vendor cancer-enriched cohort of mammograms was retrospectively collected. The cohort was composed of mammograms used in nine previously performed multi-reader multi-case studies in seven countries. The ground truth of each mammogram was verified by histopathological analysis or follow-up. In total, the cohort consisted of 2,629 mammograms (710 malignant) assessed for recall or no recall by 109 radiologists, resulting in 30,078 independent readings.
An AI system categorized all mammograms with a score between 1 and 10, indicating the level of suspicion of cancer presence (10 representing the highest suspicion for malignancy).
Two screening reading strategies were simulated via bootstrapping of the radiologists original reads of each mammogram (n=500), and cohort outcomes (sensitivity, specificity and workload -number of readings-) were compared. The first strategy consisted of all mammograms being double-read blindly, with arbitration in cases of disagreement.
As experimental strategy, all exams below a certain AI score would be single-read, with a second opinion in case of recall, while exams with a higher AI score were still double-read, again with arbitration in cases of radiologist disagreement.

Increasing the AI-score threshold for dichotomizing between single- and double-reading results in a decrease in workload while specificity increases at the expense of reduced sensitivity. The optimal threshold was obtained by setting an AI score of 6. This threshold results in a workload reduction of 27% ±0.01% (P<0.001), specificity increase of 3.7% ±0.02% (71.5% vs 67.8%, P<0.001), while sensitivity decreases by 0.6% ±0.02% (73.1% vs 73.7%, P<0.001).

Replacing double reading in breast cancer screening by single reading of mammograms labeled as most likely normal by AI could be an alternative reading strategy to reduce workload, with minimal effect in sensitivity and a moderate increase in specificity.

AI systems could reduce radiologist workload in breast cancer screening by automatic identification of most probably normal mammograms, which could undergo an alternative reading strategy than the standard double reading.

You might also like