Researchers Bao, Howard, Spielholz, Silverstein, and Polissar conducted a study designed to investigate interrater reliability — the ability of different observers/assessors to reach the same conclusions when visually estimating posture. In their introduction, the authors review the three postural observation methods that are available to ergonomists:
The researchers tested a small group of professional ergonomists with little experience using the posture rating tool/method, comparing them to a small group of technicians with limited theoretical knowledge, but more experience applying the rating tool. The central question they investigated was how reliably the raters reached the same conclusions when recording postures. It’s very important to note that they did not test how reliable the raters were at accurately assessing actual body postures, but instead how consistent the raters were, whether or not they were accurate.
The authors present a detailed review of their methods and results and the possible reasons for those results, and interested readers are encouraged to review the entire article. However, the following select findings are presented here:
The Bottom Line — How This Applies to Ergonomists
Visual observation and recording of posture is something that occupational ergonomists apply regularly in the assessment/evaluation of ergonomic risk. Observational assessment tools like RULA, for example, rely on such postural estimates. This study indicates that the angle range categories, 10° vs. 30° in this study, has a significant effect on interrater reliability, as does the training and experience one has applying an observational postural recording tool. Other factors, such as camera placement, will also affect how consistent raters are when assessing the very same posture.
Although not explicitly discussed by the authors, my own interpretation of these results is that visual observational posture recording tools are subject to substantial variation between observers, bringing their validity into question. Further, this study does not address the question of accuracy in recording the actual postures. Therefore, even if various raters were to reach consistent posture estimates, there is no way to know if those estimates accurately capture the true posture. If you need accuracy, you are much better off carefully using direct measurement techniques (e.g., goniometers) than you are applying an observational tool.
The researchers tested seven observers: three of them professional ergonomists with extensive theoretical background, but limited experience applying the postural recording tool/method; and four technicians with limited theoretical background, but more experience applying the posture tool. Each participant was asked to record postures from 37-38 randomly selected video frames taken from four different video-recorded jobs using a posture recording system developed by Bao. The system included two camera angles, set as close to perpendicular to each other as possible within the constraints of the industrial environments. The raters estimated the approximate joint angles of the various body parts by clicking on a point on a posture diagram displayed on a computer screen instead of entering a numerical angle value in degrees. The system automatically entered the numerical value, in degrees, and the participant was able to modify the value if desired.
In their analysis, the authors categorized the results three different ways:
A great deal of discussion is provided regardeing the various statistical tests that can be applied to better characterize and understand interrater reliability. The authors reject the kappa statisitic used by some other researchers in favor of a raw percentage agreement among participants, whcih is easy and straightforward to calulate and understand, and intraclass correlation coefficient (ICC), which provides a more complex analysis, but is sensitive to the postural variations from frame to frame, meaning that jobs with greater postural variations may result in hgiher ICCs than those with little variation, even when participants demonstrate high percentage of agreement.
Bao, Stephen; Howard, Ninica; Spielholz, Peregrin; Silverstein, Barbara; Polissar, Nayak. Interrater Reliability of Posture Observations. Human Factors, Volume 51, Number 3, June 2009 , pp. 292-309(18). Retreived January 25 from http://www.ingentaconnect.com/content/hfes/hf/2009/00000051/00000003/art00003.