This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in Iproceedings, is properly cited. The complete bibliographic information, a link to the original publication on https://www.iproc.org/, as well as this copyright and license information must be included.
Many dermatologic cases are first evaluated by primary care physicians or nurse practitioners.
This study aimed to evaluate an artificial intelligence (AI)-based tool that assists with interpreting dermatologic conditions.
We developed an AI-based tool and conducted a randomized multi-reader, multi-case study (20 primary care physicians, 20 nurse practitioners, and 1047 retrospective teledermatology cases) to evaluate its utility. Cases were enriched and comprised 120 skin conditions. Readers were recruited to optimize for geographical diversity; the primary care physicians practiced across 12 states (2-32 years of experience, mean 11.3 years), and the nurse practitioners practiced across 9 states (2-34 years of experience, mean 13.1 years). To avoid memory effects from incomplete washout, each case was read once by each clinician either with or without AI assistance, with the assignment randomized. The primary analyses evaluated the top-1 agreement, defined as the agreement rate of the clinicians’ primary diagnosis with the reference diagnoses provided by a panel of dermatologists (per case: 3 dermatologists from a pool of 12, practicing across 8 states, with 5-13 years of experience, mean 7.2 years of experience). We additionally conducted subgroup analyses stratified by cases’ self-reported race and ethnicity and measured the performance spread: the maximum performance subtracted by the minimum across subgroups.
The AI’s standalone top-1 agreement was 63%, and AI assistance was significantly associated with higher agreement with reference diagnoses. For primary care physicians, the increase in diagnostic agreement was 10% (
AI assistance was associated with significantly improved diagnostic agreement with dermatologists. Across race and ethnicity subgroups, for both primary care physicians and nurse practitioners, the effect of AI assistance remained high at 8%-15%, and the performance spread was similar at 5%-7%.
This work was funded by Google LLC.
AJ, DW, VG, YG, GOM, JH, RS, CE, KN, KBD, GSC, LP, DRW, RCD, DC, Yun Liu, PB, and Yuan Liu are/were employees at Google and own Alphabet stocks.
Results of randomized reader study comparing clinicians assisted by artificial intelligence (AI, in orange) and those without assistance (“unassisted”, in blue). Performance was measured using the top-1 agreement metric, which indicates the rate at which the clinicians’ primary diagnosis matched that of a panel of dermatologists. The leftmost column summarizes the overall results for all readers and cases, whereas the other columns represent subgroups based on race/ethnicity. The results for primary care physicians (PCPs, top) and nurse practitioners (NPs, bottom) were similar.