1 December 2019
Halling Brown M et al
Artificial intelligence detecting breast cancer in a screening population: Accuracy, earlier detection on prior mammograms, and relation with cancer grade.
Halling Brown M, Rodriguez-Ruiz A, Karssemeijer N, Wallis M, Young K.
To analyze the breast cancer detection accuracy of a deep learning-based artificial intelligence (AI) system in screening mammograms of screen-detected cancers, in their prior exams, and study possible dependencies with cancer grade.
METHOD AND MATERIALS
A total of 2,683 screening mammograms with biopsy-proven screen-detected cancers from the OPTIMAM database were retrospectively collected (1,212 had a prior mammogram available). OPTIMAM contains screening mammograms performed in the UK, where women are invited triennially, and each mammogram is independently read by two radiologists with an approximate recall rate of 4%.
Regarding the available histology of the screen-detected cases, 1969 presented invasive cancers and 670 contained DCIS only; 1001 presented high-grade (G3) cancers, 1186 intermediate-grade (G2) cancers, and 314 low-grade (G1) cancers.
Each mammogram was analyzed by an AI system (Transpara™, ScreenPoint Medical). The AI system produced a recall decision at different recall rates: 50%, 10%, 4%. Recall rate calibration was established for a typical screening population with another set of independent data. The mammograms in this study were never used to train, validate or test the AI system before.
The distributions of recalled mammograms were statistically compared using Pearson’s chi-squared test at 95% significance level.
The AI system had a sensitivity for screen-detected cancers of 99.3%, 87.7% and 76.1% at recall rates of 50%, 10%, and 4% respectively.
When analyzing prior screening mammograms of screen-detected cancers, 16.8% would have been recalled by the AI system at a recall rate of 4%.