28 November 2021
van Winkel S , Janssen N, Karssemeijer N, Mann R
Replacing a radiologist by AI in Dutch population based breast cancer screening and the impact of breast density on performance
AIM AND OBJECTIVE
Validation of the performance of a commercially available artificial intelligence (AI) system in a population-based breast cancer screening program, considering the implementation of AI as a second reader, and to assess the effect of breast density.
MATERIALS AND METHOD
2D digital screening mammograms from a consecutive cohort including 43666 women screened between 2013 and 2014 in a region of the Dutch breast screening program were retrospectively collected. All exams were assigned a level of suspicion by an AI system on a continuous scale (Transpara, ScreenPoint Medical). Presence of cancers was verified using data from the national Dutch Cancer Registration (NKR – IKNL) until 2019. For each woman, 4 to 5 year of complete follow-up data on screen detected (SD), interval (IC) and future breast cancers (FBC) was available. Sensitivity, specificity and recall rate were compared for different reading scenarios: first reader (R1), stand-alone use of Transpara (using recall rate R1 as operating point), double reading (R1 & R2), and first reader together with Transpara (R1&AI). Differences found were tested applying McNemar and Wald tests. To examine the effect of density on cancers identified by human reading versus AI, volumetric density scores were obtained (Volpara). Differences on case-level were descriptively analyzed and tested (Wilcoxon signed-rank test).
The cohort included 987 cancers: 312 SD, 222 IC (within two years after screening) and 453 FBC (screen detected in or after the next screening round). 1071 cases were recalled by R1 (2.5.%, 95% CI = 2.3-2.6%), including 291 SD cancers, 3 IC and 16 FBC (sensitivity 31.4%, 95% CI =28.5-34.4%). Transpara recalled 1071 cases with 244 SD cancers, 23 IC and 69 FBC (sensitivity 34.0%, 95% CI = 31.1-37.1%). R1&R2 recalled 1143 cases after consensus (2.6%, 95%C =2.5-2.8%), including 312 SD, 2 IC and 18 FBC (sensitivity 33.6%, 95% CI =30.7-36.7%). Replacing the second reader by Transpara (R1&AI) resulted in 1796 recalls (4.1%, 95% CI = 3.9-4.3%), including 306 SD, 26 IC and 74 FBC (sensitivity 41.1%, 95%CI = 38.0-44.3%), improving the sensitivity with 7.5% (95%CI=5.6-9.4%, p<0.05). There was no significant difference in the relative performance of R1 and Transpara across density categories (p=0.4857). However, the additional cancers (n= 26) identified by Transpara but not by R1, were mostly observed in category A (+33.3% n=5) and D (+32.1% n=9) breasts.
Transpara provides higher sensitivity than the first reader and is complimentary hence Transpara has potential as second reader but an effective arbitration process is necessary. This effect seems independent of breast density, albeit a remarkable amount of additional cancers detected was in highest and lowest density categories