Page 25 - JSOM Fall 2018
P. 25

Results                                            A power law curve permits an instructor to overlay a threshold
                                                                 on the graph. A threshold such as either ≤30s or ≤60s can act
              Performance Metrics of Individual Users            as a cutoff for deciding between pass and fail for an individ­
              and the Group of Users                             ual use. Deciding pass­fail, therefore, is easily displayed, but
              Our first analysis focused  on differing definitions of profi­  interpreting proficiency remains unclear unless it is defined in
              ciency applied to the individual users and to the group of users   operational terms of the pass­fail performances. The interuse
              as if the group was a class of students or a squad of Soldiers.   variability  of  each  user’s  data  made  any  unclear  definitions
              Using a time to determination of bleeding control of no more   subjective. For example, user 10 achieved a pass (≤30s) on
              than 90 seconds (≤90s), all 10 users became proficient at their   use numbers 2, 4, 7, 8, 10, 11, and 13–20, but the instructor
              first use and remained so thereafter. At ≤60 seconds (≤60s), all   may choose any among these 14 values to define proficiency.
              became proficient at seventh use and thereafter. At ≤30 sec­  User 10, therefore, might become proficient at use 2. On the
              onds (≤30s), nine of 10 users became proficient at 16th use   other hand, if the instructor chose to define proficiency jointly
              and thereafter. At ≤30s for 10 uses in a row, two of 10 users   with concurrent components as (1) ≤30s on a use and (2) ≤30s
              became proficient at the 10th use and thereafter, and another   on all uses thereafter, proficiency for user 10 could vary more
              six did so at the 11th. At ≤30s for all uses, the group failed.   than sixfold (use 13 versus use 2) simply by a change in choice
              These results indicated that (1) the definition of proficiency   of definition. Power law curves did not offer, by themselves,
              was important because it affected the outcome of assessment;   objective criteria of proficiency, but the instructor could derive
              (2) for the sake of clarity, it was necessary to make the innate   such criteria such as by choosing clear definitions.
              definition of proficiency explicitly operational; (3) the out­
              come often differed whether the definition applied to a user   Another shortcoming in the usefulness of power law curves
              or the group; and (4) the degree of challenge posed to user   was the display of only one metric. Furthermore, no changes
              performance differed by the criteria defining proficiency.
                                                                 in trend are objectively detectable with the power law tool.
                                                                 The power law curve allows instructors to monitor trends, but
              Trends in the Power Law of Practice                it does not objectively judge proficiency. Therefore, we next
              Our initial chart used a statistical graph of the power law,   looked at failure trends.
              a relationship in which one quantity varies as a power of
              another. Each user had data from their 20 uses plotted with   Failure Trends
              one metric: time to determination of bleeding control. When   We looked at performance in each use as a passed or a failed
              all 10 datasets were plotted, the plot of 200 data points was   result (Figure 2). To better understand this section, the reader
              crowded and confusing, so we narrowed our scope to three   should closely follow Figure 2. The count of failures accrued by
              users. User 10 learned the most, user 8 learned the least, and   use number was cumulatively summed over the 200 uses and
              user 7 was in between (Figure 1). Such performances were   plotted for five metrics. Each metric had a different criterion to
              displayed clearly because data were discernably spread out.   define failure. In fact, results differed for each metric by both
              The three trend lines of performances were classic power­law   plot and endpoint. When time ≤60s was designated as a pass,
              curves (users 7, 8, and 10 in Figure 1). All three showed in­  three failures resulted: the least count, the lowest endpoint,
              dividual degrees of learning in downslope trends representing   and the lowest plot. Just above the plot of ≤60s was effective­
              improvements in speed as users tended to get faster with ac­  ness: That resulted in seven failures. In regard to the composite
              crued experience. Such degrees were similar to the mathemat­  metric of effectiveness and ≤60s, the result was 10, whereas
              ical values (i.e., exponents and constants) of the equations   for ≤30s, the result was 31. The top plot was effectiveness and
              representing the lines. Although the users in this study were,   ≤30s, resulting in 37. Because endpoints differed, each metric
              on average, fairly experienced tourniquet users, learning   indicated its degree of challenge posed to user performance.
              curves were still observed instead of the flat lines we had gen­  Meanwhile, time ≤60s was least challenging (3), whereas ef­
              erally expected. Only one of 10 curves (user 8) had a nearly   fectiveness and ≤30s jointly were most challenging (37). If one
              flat trend. The individual data points for the individual users   metric was chosen to judge proficiency over another, then the
              varied above or below their trend, a best­fit line derived from   outcome may differ as much as 12­fold (37/3). Such a large
              their 20 data points. The distance that the data point was   difference (i.e., 37 − 3 = 34) in the assessed outcome is why the
              above or below the line was generally larger earlier and lesser   choice of “metric matters.” Such substantial intermetric differ­
              later, and this lesser extent is the result of the typical learning   ences indicated that the choice among possible individual met­
              effect as users tend to gain speed when gaining experience,   rics considerably changed how performances were portrayed
              while variance tends to lessen.
                                                                 and, thus, interpreted.

              FIGURE 1  User performance by time shown in data points and   FIGURE 2  Failure counts using five different metrics of
              trend lines.                                       performance.
















                                                                                              Your Metric Matters!  |  23
   20   21   22   23   24   25   26   27   28   29   30