Page 62 - JSOM Spring 2026
P. 62

gains. Sensitivity improved from 0.63 (SD 0.143) to 0.90 (SD   confident”) nearly doubled, increasing from 20% to 37%. The
          0.092), specificity from 0.70 (SD 0.145) to 0.86 (SD 0.096),   proportion of responses indicating no confidence at all was cut
          and accuracy from 0.67 (SD 0.097) to 0.88 (0.060). These im-  in half, from 8.2% to 4.1%. The most frequently selected con-
          provements were statistically significant based on McNemar’s   fidence level shifted from “moderately confident” in the unas-
          test (P<.001) (Figure 2).                          sisted condition (42%) to “confident” in the assisted condition
                                                             (29%). These changes in confidence distribution were statisti-
          Reader confidence in their clip-level interpretations improved   cally significant according to the Stuart-Maxwell chi-squared
          markedly with the assistance of AI. Across conditions, there   test (P<.001) (Table 1).
          was a clear change in the distribution of confidence ratings
          when  AI  support  was  available.  The  proportion  of  low-   Standalone AI Interpretation
          confidence ratings (“not at all confident” and “slightly con-  The standalone AI system demonstrated excellent diagnostic
          fident”) decreased from 38% without  AI to 33% with  AI.   performance relative to the expert consensus standard, achiev-
          In contrast, high-confidence ratings (“confident” and “very   ing a sensitivity of 1.00, a specificity of 0.96, and an accuracy














                                                                                   FIGURE 1  AUROC curves of
                                                                                   diagnostic performance with and
                                                                                   without AI assistance.

                                                                                   Each colored line represents an
                                                                                   individual corpsman’s performance
                                                                                   across conditions. The black dashed
                                                                                   line shows the group mean with
                                                                                   95% CIs.













          FIGURE 2  Improvement in diagnostic performance with AI assistance.

























          Each colored dot-dashed line pair traces an individual corpsman’s score in the AI-unassisted (left) and AI-assisted (right) sessions; the black
          dashed line marks the group mean and its 95% CI. At the group level, mean sensitivity, mean specificity, and mean accuracy increased with AI
          assistance, and these gains reached statistical significance. The pink dotted line indicates an individual reader whose specificity and accuracy did
          not improve.

          60  |  JSOM   Volume 26, Edition 1 / Spring 2026
   57   58   59   60   61   62   63   64   65   66   67