The following article is reprinted with permission from The Ergonomics Report™ Archives, where it originally appeared on October 29, 2012.
An old joke goes something like this.
Late at night, a police officer notices a man searching the ground under a streetlight. When the officer asks, the man explains that he’s looking for his car keys. After a few minutes of fruitless searching, the officer asks the man if he’s sure that he dropped the keys there.
“No”, says the man, “I lost them in the parking lot; it’s just that the light’s so much better over here”.
If we as ergonomics practitioners aren’t careful to measure the reliability and validity of the MSD checklists that we use, then, just like the man in the joke, we may be looking for MSD risks where it’s convenient, but not necessarily in the right place.
Checklists come in all forms and shapes: they may be self-developed, public or proprietary. We may call them surveys, risk identification forms, screening tools, assessment tools, or something else, but their purpose is to identify risk factors that put jobs “at-risk”. Some checklists may have been validated for various contexts of use, others not at all. In this article, I will argue that we need to manage the use of checklists.
As practitioners, we must manage the use of checklists by collecting information on their reliability and validity just as one would gather information to manage any other manufacturing process. This can be done simply and dynamically by any practitioner in a way that is specific to the worksite for which they are responsible.
Reliability
At its simplest, reliability is a measure of consistency. A good MSD checklist should provide consistent results. If Mary evaluates a job with a checklist on Monday and the results indicate it’s a potential problem job, repeating the evaluation on Friday for the same job should also indicate that it’s a problem job, regardless of whether Mary or John repeats the evaluation. These are the two basic types of reliability with which a practitioner needs to be concerned: intra-rater reliability, and inter-rater reliability.
We commonly say that what is measured gets managed. How does a practitioner manage reliability? One potential way to measure reliability is to systematically conduct calibration checks, for example periodically asking all checklist users to evaluate a reference job, one for which the values for the risk factors are known. By comparing the checklist users’ results to the known parameters of the reference job, both inter-rater and intra-rater reliability are readily apparent. Performance in assessing the calibration samples becomes a measure of the reliability of both individual raters and a group of raters. An ergonomics manager can then assess whether the reliability is acceptable, or if some action needs to be taken to improve it.
Validity
At its core, validity is the confidence that an MSD checklist is doing what we want it to do, effectively and efficiently identifying jobs that are problem jobs. How can we go about measuring validity?
Problem Jobs, true positives, false positives, and prevalence
Because there may be some level of risk present in nearly all jobs, checklists commonly establish some minimal risk-rating criterion to prioritize jobs. For example, a job with a RULA score greater than or equal to 7 is deemed to require “immediate investigation and change.”[1]
We typically employ checklists as a means to identify jobs at-risk for the occurrence of musculoskeletal disorders. That is, we want to assess how valid a checklist is for the purpose of identifying problem jobs. We might define a problem job as any job that has had a diagnosed musculoskeletal injury or illness within the past six months.
The question that we ask is “how good of a job does our checklist do at identifying those problem jobs, and, by inference, other problem jobs”?
Suppose that we randomly select 100 jobs within a manufacturing plant and analyze them using our MSD checklist. Some meet the definition of a problem job, the others do not. The results are summarized in Table 1 below.
|
Problem Job |
Non Problem Job |
|
Checklist Yes |
7 |
2 |
9 |
Checklist No |
5 |
86 |
91 |
|
12 |
88 |
100 |
Table 1. Summary of checklist results for 100 randomly sampled jobs
Based on our sample, we know that:
- Seven times out of 9 (0.78), a positive checklist indication correctly identified a problem job,
- Two times out of 88 (0.02), a positive checklist indication incorrectly identified a job as a problem job,
- Twelve of the 100 jobs were problem jobs.
If we were to look at a sample of 100 more jobs drawn from the same factory, we would again expect 12 to be problem jobs and 88 non-problem jobs. Based on our previous experience,
- We would expect the checklist to give a positive indication on 78 percent of the 12 problem jobs (9.4 jobs)
- We would expect the checklist to incorrectly give a positive indication that about 2 percent of the 88 non-problem jobs were problem jobs (2.0 jobs)
- We would expect that about 82 percent of the positive indications given by the checklist will correctly identify problem jobs (9.4/11.4)
We asked the question, “how does one evaluate the validity of a checklist, specifically, how effective is it at identifying problem jobs”? We have identified a method to answer that question. In our example, we determined that slightly more than 80 percent of positive indications on the checklist correctly identify problem jobs.
It is possible to use this method to compare different forms of a checklist, for example, to determine if a simplified format gives equivalent results. It can also be used to compare different checklists, or to contrast the use of a checklist with alternate means of identifying problem jobs, such as guessing!
In summary, checklists are commonly used, but we must understand the reliability and validity of these tools before they can be truly useful. This article outlines some potential methods of assessing both the reliability and validity of checklists.
The editors of the journal Work have kindly made a longer article on the topic of checklist reliability and validity freely available. It can be accessed here: http://iospress.metapress.com/content/v22040146v41l810/fulltext.pdf
[1] The Ergonomics Center of North Carolina (2012) http://www.theergonomicscenter.com/graphics/ErgoAnalysis%20Software/RULA.pdf
Tom Albin is an engineer and ergonomist with more than 25 years experience in ergonomics as a principle researcher, a corporate ergonomist, product developer and consultant. In addition to his consulting practice, he is the Executive Director of the Office Ergonomics Research Committee (OERC), a research consortium. He also chairs the committee revising the ANSI/HFES 100 standard.
This article originally appeared in The Ergonomics Report™ on 2012-10-09.