Background

IPROC

iproc

Iproceedings

2369-6893

JMIR Publications

Toronto, Canada

v8i1e36885

10.2196/36885

Abstract

Race- and Ethnicity-Stratified Analysis of an Artificial Intelligence–Based Tool for Skin Condition Diagnosis by Primary Care Physicians and Nurse Practitioners

Derrick

Thomas

Jain

Ayush

MS 1 Way

David

ME 1 Gupta

Vishakha

MS 1 Gao

PhD 1 de Oliveira Marinho

Guilherme

BS 1 Hartford

Jay

MS 1 Sayres

Rory

PhD 1 Kanada

Kimberly

MD 2 Eng

Clara

PhD 1 Nagpal

Kunal

MS 1 DeSalvo

Karen B

MSc, MPH, MD 1 Corrado

Greg S

PhD 1 Peng

Lily

MD, PhD 1 Webster

Dale R

PhD 1 Dunn

R Carter

MS, MBA 1 Coz

David

MS 1 Huang

Susan J

MD 2 Liu

Yun

PhD 1

Google Health

ATTN liuyun

3400 Hillview Ave

Palo Alto, CA, 94304

United States 1 415 736 0823 liuyun@google.com

https://orcid.org/0000-0003-4079-8275

Bui

Peggy

MD, MBA 1 3 Liu

Yuan

PhD 1

1 Google Health

Palo Alto, CA

United States 2 Work done at Google Health via Advanced Clinical

Deerfield, IL

United States 3 University of California, San Francisco

San Francisco, CA

United States

Corresponding Author: Yun Liu liuyun@google.com

Jan-Dec 2022

9 2 2022

8 1

e36885

28 1 2022 28 1 2022

©Ayush Jain, David Way, Vishakha Gupta, Yi Gao, Guilherme de Oliveira Marinho, Jay Hartford, Rory Sayres, Kimberly Kanada, Clara Eng, Kunal Nagpal, Karen B DeSalvo, Greg S Corrado, Lily Peng, Dale R Webster, R Carter Dunn, David Coz, Susan J Huang, Yun Liu, Peggy Bui, Yuan Liu. Originally published in Iproceedings (https://www.iproc.org), 09.02.2022.

2022

This is an open-access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted use, distribution, and reproduction in any medium, provided the original work, first published in Iproceedings, is properly cited. The complete bibliographic information, a link to the original publication on https://www.iproc.org/, as well as this copyright and license information must be included.

Background

Many dermatologic cases are first evaluated by primary care physicians or nurse practitioners.

Objective

This study aimed to evaluate an artificial intelligence (AI)-based tool that assists with interpreting dermatologic conditions.

Methods

We developed an AI-based tool and conducted a randomized multi-reader, multi-case study (20 primary care physicians, 20 nurse practitioners, and 1047 retrospective teledermatology cases) to evaluate its utility. Cases were enriched and comprised 120 skin conditions. Readers were recruited to optimize for geographical diversity; the primary care physicians practiced across 12 states (2-32 years of experience, mean 11.3 years), and the nurse practitioners practiced across 9 states (2-34 years of experience, mean 13.1 years). To avoid memory effects from incomplete washout, each case was read once by each clinician either with or without AI assistance, with the assignment randomized. The primary analyses evaluated the top-1 agreement, defined as the agreement rate of the clinicians’ primary diagnosis with the reference diagnoses provided by a panel of dermatologists (per case: 3 dermatologists from a pool of 12, practicing across 8 states, with 5-13 years of experience, mean 7.2 years of experience). We additionally conducted subgroup analyses stratified by cases’ self-reported race and ethnicity and measured the performance spread: the maximum performance subtracted by the minimum across subgroups.

Results

The AI’s standalone top-1 agreement was 63%, and AI assistance was significantly associated with higher agreement with reference diagnoses. For primary care physicians, the increase in diagnostic agreement was 10% (P<.001), from 48% to 58%; for nurse practitioners, the increase was 12% (P<.001), from 46% to 58%. When stratified by cases’ self-reported race or ethnicity, the AI’s performance was 59%-62% for Asian, Native Hawaiian, Pacific Islander, other, and Hispanic or Latinx individuals and 67% for both Black or African American and White subgroups. For the clinicians, AI assistance–associated improvements across subgroups were in the range of 8%-12% for primary care physicians and 8%-15% for nurse practitioners. The performance spread across subgroups was 5.3% unassisted vs 6.6% assisted for primary care physicians and 5.2% unassisted vs 6.0% assisted for nurse practitioners. In both unassisted and AI-assisted modalities, and for both primary care physicians and nurse practitioners, the subgroup with the highest performance on average was Black or African American individuals, though the differences with other subgroups were small and had overlapping 95% CIs.

Conclusions

AI assistance was associated with significantly improved diagnostic agreement with dermatologists. Across race and ethnicity subgroups, for both primary care physicians and nurse practitioners, the effect of AI assistance remained high at 8%-15%, and the performance spread was similar at 5%-7%.

Acknowledgments

This work was funded by Google LLC.

Conflicts of Interest

AJ, DW, VG, YG, GOM, JH, RS, CE, KN, KBD, GSC, LP, DRW, RCD, DC, Yun Liu, PB, and Yuan Liu are/were employees at Google and own Alphabet stocks.

deep learning computer-assisted diagnosis dermatology clinical images

Multimedia Appendix 1

Results of randomized reader study comparing clinicians assisted by artificial intelligence (AI, in orange) and those without assistance (“unassisted”, in blue). Performance was measured using the top-1 agreement metric, which indicates the rate at which the clinicians’ primary diagnosis matched that of a panel of dermatologists. The leftmost column summarizes the overall results for all readers and cases, whereas the other columns represent subgroups based on race/ethnicity. The results for primary care physicians (PCPs, top) and nurse practitioners (NPs, bottom) were similar.