Page 25 - JSOM Fall 2018
P. 25
Results A power law curve permits an instructor to overlay a threshold
on the graph. A threshold such as either ≤30s or ≤60s can act
Performance Metrics of Individual Users as a cutoff for deciding between pass and fail for an individ
and the Group of Users ual use. Deciding passfail, therefore, is easily displayed, but
Our first analysis focused on differing definitions of profi interpreting proficiency remains unclear unless it is defined in
ciency applied to the individual users and to the group of users operational terms of the passfail performances. The interuse
as if the group was a class of students or a squad of Soldiers. variability of each user’s data made any unclear definitions
Using a time to determination of bleeding control of no more subjective. For example, user 10 achieved a pass (≤30s) on
than 90 seconds (≤90s), all 10 users became proficient at their use numbers 2, 4, 7, 8, 10, 11, and 13–20, but the instructor
first use and remained so thereafter. At ≤60 seconds (≤60s), all may choose any among these 14 values to define proficiency.
became proficient at seventh use and thereafter. At ≤30 sec User 10, therefore, might become proficient at use 2. On the
onds (≤30s), nine of 10 users became proficient at 16th use other hand, if the instructor chose to define proficiency jointly
and thereafter. At ≤30s for 10 uses in a row, two of 10 users with concurrent components as (1) ≤30s on a use and (2) ≤30s
became proficient at the 10th use and thereafter, and another on all uses thereafter, proficiency for user 10 could vary more
six did so at the 11th. At ≤30s for all uses, the group failed. than sixfold (use 13 versus use 2) simply by a change in choice
These results indicated that (1) the definition of proficiency of definition. Power law curves did not offer, by themselves,
was important because it affected the outcome of assessment; objective criteria of proficiency, but the instructor could derive
(2) for the sake of clarity, it was necessary to make the innate such criteria such as by choosing clear definitions.
definition of proficiency explicitly operational; (3) the out
come often differed whether the definition applied to a user Another shortcoming in the usefulness of power law curves
or the group; and (4) the degree of challenge posed to user was the display of only one metric. Furthermore, no changes
performance differed by the criteria defining proficiency.
in trend are objectively detectable with the power law tool.
The power law curve allows instructors to monitor trends, but
Trends in the Power Law of Practice it does not objectively judge proficiency. Therefore, we next
Our initial chart used a statistical graph of the power law, looked at failure trends.
a relationship in which one quantity varies as a power of
another. Each user had data from their 20 uses plotted with Failure Trends
one metric: time to determination of bleeding control. When We looked at performance in each use as a passed or a failed
all 10 datasets were plotted, the plot of 200 data points was result (Figure 2). To better understand this section, the reader
crowded and confusing, so we narrowed our scope to three should closely follow Figure 2. The count of failures accrued by
users. User 10 learned the most, user 8 learned the least, and use number was cumulatively summed over the 200 uses and
user 7 was in between (Figure 1). Such performances were plotted for five metrics. Each metric had a different criterion to
displayed clearly because data were discernably spread out. define failure. In fact, results differed for each metric by both
The three trend lines of performances were classic powerlaw plot and endpoint. When time ≤60s was designated as a pass,
curves (users 7, 8, and 10 in Figure 1). All three showed in three failures resulted: the least count, the lowest endpoint,
dividual degrees of learning in downslope trends representing and the lowest plot. Just above the plot of ≤60s was effective
improvements in speed as users tended to get faster with ac ness: That resulted in seven failures. In regard to the composite
crued experience. Such degrees were similar to the mathemat metric of effectiveness and ≤60s, the result was 10, whereas
ical values (i.e., exponents and constants) of the equations for ≤30s, the result was 31. The top plot was effectiveness and
representing the lines. Although the users in this study were, ≤30s, resulting in 37. Because endpoints differed, each metric
on average, fairly experienced tourniquet users, learning indicated its degree of challenge posed to user performance.
curves were still observed instead of the flat lines we had gen Meanwhile, time ≤60s was least challenging (3), whereas ef
erally expected. Only one of 10 curves (user 8) had a nearly fectiveness and ≤30s jointly were most challenging (37). If one
flat trend. The individual data points for the individual users metric was chosen to judge proficiency over another, then the
varied above or below their trend, a bestfit line derived from outcome may differ as much as 12fold (37/3). Such a large
their 20 data points. The distance that the data point was difference (i.e., 37 − 3 = 34) in the assessed outcome is why the
above or below the line was generally larger earlier and lesser choice of “metric matters.” Such substantial intermetric differ
later, and this lesser extent is the result of the typical learning ences indicated that the choice among possible individual met
effect as users tend to gain speed when gaining experience, rics considerably changed how performances were portrayed
while variance tends to lessen.
and, thus, interpreted.
FIGURE 1 User performance by time shown in data points and FIGURE 2 Failure counts using five different metrics of
trend lines. performance.
Your Metric Matters! | 23

