Page 28 - JSOM Fall 2018
P. 28
be seen simultaneously as both best and worst is a paradox. Discussion
To understand the lens metaphor is to resolve the paradox and
vice versa. By focusing on different criteria, each lens allows The key finding from this study revealed that the choice of
the assessor to see a specific outlook to thereby perceive a po metric indeed mattered for the meaning of performance. The
tentially different outcome. assessed outcome often was changed by the choice of metric.
This choice finding was consistent and coherent across tool
type (i.e., power law curve, failure count, or LCCUSUM),
Rankings of User Performance by Use of Two Metrics
With the LCCUSUM results, we ranked user performances metric type (i.e., speed, effectiveness, or passfail), threshold
to compare and contrast our choice of metric between (1) ef (i.e., 90s, 60s, or 30s), and metric component count (i.e., sin
fectiveness and ≤60s, and (2) effectiveness and ≤30s (Figures gle or multiple). This choice finding is consistent with prior re
4 and 5, respectively). Ranks were 1–10, best to worst (Figure search such as when we found that learning curves required to
6). We used the determinant of rank as the data point after the optimize performance varied more than 30fold depending on
9
lower decision limit was crossed, with the best rank going to the metric chosen (e.g., effectiveness versus blood loss). Other
the user who became proficient first. For tiebreaking, lower studies found similar choice effects in education and caregiv
endpoints were better. ing, and some resuscitation metrics can discern highquality
performance in first aid as preliminarily associated with clini
FIGURE 6 Changes in ranked performance by changing from one cal outcomes. 15–17 Your metric matters, so choose wisely which
metric to another. one your users need most, but that decision may be challeng
ing, because it requires refined understanding.
The minor finding was that choosing the least challenging
metric often led to an easy pass. Such a pass is fast, needs
minimal remediation, requires minimal extra time of every
one, and gives pleasant feedback. However, such ease may suit
only novices, and adding a challenging metric like mechanical
effectiveness may suit novices later in their training to ensure
reliable control of bleeding.
Over the years, we routinely have asked instructors of military
medical courses questions such as the following. (1) Who are
your clients? This generally aided us in understanding back
grounds, levels of skill, experience, and settings for a spec
The change in metric chosen led to a change in rank. Nine of 10 trum including medic students, nurses, or trauma surgeons.
users changed rank. Five rose. Four fell. Only user 1, who was (2) What are you trying to do for them? The instructor’s intent
an expert with the most experience, was unchanged in the first varied such as from tourniquet familiarization, to actual han
rank. The average magnitude of the change in rank was 3.6. dling, or to mechanical competence (e.g., tighten until the dis
Because the change in the metric chosen had changed the rank tal pulse is not palpable). (3) What is a success for them? This
order, the perception changed of which users performed better. helped us understand if there was a metric or threshold chosen
or if they were looking for another goal, like confidence. (4)
The less challenging metric (i.e., effectiveness and ≤60s) had What defines success? This gives the instructor another chance
three users tied at the first rank and four tied at the fifth rank, to answer question 2 or 3 because answers often did not focus
whereas the more challenging metric (i.e., effectiveness and on learners. For example, one answer was: “I got 46 people
≤30s) had only two users tied at one rank, fourth. The less to get through this station in an hour to get them on the bus
challenging metric had two gaps among ranks in an average to lunch.” This is a practical metric for the instructor to ease
size of 2.5 ranks. The more challenging metric had one gap the needs of the moment, but it cannot reliably assess pro
in a size of one rank. The effect size of metric choice on gap ficiency. It sets learners up for overconfidence in their own
among ranks was approximately twofold for gap number and assessment of their performance. The instructor’s way of as
gap size. The more challenging metric stratified performances sessing had conveniently devolved to the path of least effort.
more. Elsewhere, teaching cardiopulmonary resuscitation in first aid
as “hard and fast” was easy to remember, but in reality, care
With the more challenging metric, we had expected more ex givers inadvertently may provide markedly suboptimal chest
17
perienced users to be more reliably ranked. That was what compressions and rates. By focusing on such metrics, quality
occurred albeit with a mixed result. To scale levels of skill cardiopulmonary resuscitation performance may improve as
15–17
(novice to expert) in the same direction as ranks, we allotted pects of caregiving.
one through five points for expert through novice, respectively.
We plotted rank order (1–10 on the xaxis) for each user for We are aware that the present study is limited by design to
both performance metrics. We plotted against their rank order generating hypotheses (Table 2) for future scholarly works. In
their skill level in points, on the yaxis. The results differed by preparing people to stop the bleed in an emergency, assessed
metric. The more challenging metric showed a moresloped outcomes often changed by the choice of performance metric.
line, indicating a stronger association between ranked perfor In the end, the metric matters.
mance and skill level (y = 0.3128x + 1.2109; R² = 0.4369) than
the less challenging metric (y = 0.086x + 2.5045; R² = 0.0346). Funding
On average, the more challenging metric appeared to discern This project was funded by the US Army Medical Research
better the assessed skill level of the user. and Materiel Command.
26 | JSOM Volume 18, Edition 3 / Fall 2018

