From The Ergoweb® Learning Center

Ergonomics and Voting Systems: Will Your Vote Be Properly Counted?

Voting systems that are reliable, easy to use and accurately reflect voter intent are the foundation of free and fair elections; the kind Americans regard as their birthright. Human Factors and Ergonomics Society (HFES) members who are working on the issue see uniformly-applied, research-based design and test standards as the way forward. Bill Killam, CHPF, one of these experts, argues for performance-based test standards. He sees them as well suited to a society built on the principle of free competition.

Americans now know that birthright is threatened. The headlines suggest voting systems are built on unreliable technology. A search for "voting machines" in Google News turns up 2,600 responses. Many report successful tests, but there are at least as many about serious glitches. A New York Times article on October 8 complains that hardly an election goes by without reports of serious vulnerabilities or malfunctions with the machines. Diebold machines, in one instance, dropped votes when they were transferred from individual machines to the central server in a county’s election headquarters. The company has since notified more than 30 states to be on the lookout for missing votes. And voters report errors like vote flipping, in which the vote they cast for one candidate is recorded for another. And there are reports of new machines for the disabled that freeze up.

Florida’s Sun-Sentinel newspaper, reporting on tests of Palm County’s new optical scanner voting machines, noted that a manual examination of some 12,000 ballots rejected by the machines were found to be legitimate votes: They had been marked clearly and correctly and should have been read by the machines. They also found ballots marked incorrectly, and therefore couldn’t be read by the machines, but that  indicated a clear choice by the voter. When the same tests were repeated, the results were different.

Back in 2000, though, the number and scale of the flaws shocked a nation that was accustomed to thinking US elections are second to none for efficiency and honesty. It was the year of the hanging chad, the ballot problem in Florida that led to a presidential election decided by the Court. The use of punchcard machines and a confusing ballot layout caused some voters to select an unintended candidate or double-punch the ballot. In essence, the Court decided voter intent in that state.

The close and contested result prompted increased scrutiny of the whole system, and many other problems were found. A commission headed by the former Presidents Jimmy Carter and Gerald Ford recommended a major overhaul of the nation’s election system in response to the Florida election turmoil. Legislators and election officials addressed the ailments with a rush to new technology and to improved guidelines.

About More than Machines

In an interview in October with the Ergonomics Report™, human factors engineer Killam expressed doubts about the direction of the rush.  The founder and principal of User-Centered Design, which he describes as a small human factors consulting company, Killam regards ailing voting systems as a reflection of more than flawed machines.

From a human factors standpoint, he said, except for the addition of accessible voting technology, the machines have not changed greatly since 2000. He does not believe they have been improved enough to ensure a mainly trouble free experience to voters on November 4. "We have to separate a couple of things, the voting machine itself from issues such as ballot layout, and [in addition to] those two, you have the actual operation of the machines. The poll workers who have to maintain those machines are an older population generally. They are often retirees. They don’t necessarily have a lot of experience with technology. … and so, these three things interact with each other significantly.”

The lack of solutions points, in part, to the complexity of underlying issues in knowing how much trouble voters actually experience. "The problem, and it’s true in all human performance issues like this, is that people may believe they have a problem, but aren’t sure, while other may have problems and never realize it. “We have absolutely no way of knowing how many people [in the Florida 2000 election] ended up choosing a candidate who wasn’t intended. We only know that some people think they might have. Some of them probably did cast as intended, while others may have cast an unintended vote and never realized it."

He noted that the largest amount of change has been in providing guidance for ballot layout, but that approach introduces its own issues. “Ballot layout can’t be affected on certain situations where the machine dictates layout issues,” Killam explained, “so if there are problems that the machine introduces, then they can’t be compensated for.” We continue to see ballot layout problems, he added.

Killam blames the obstacles to establishing reliable and accurate systems on the nature of voting in the United States and the number of different system designs. "The states actually run their elections and report nationally, so the states have some autonomy. Their autonomy includes choosing particular equipment."

The Performance Approach

A lack of standardization across the equipment is a fundamental problem, he explained. "Even if there is variability with the design, the performance of that equipment should be standardized, and that is really the work we have been doing – working on performance standards that every system could be tested against – even if it is using different technology, [and] different interface designs."

"A reasonable analogy [is] the emission standards. We can have multiple vendors make cars, but when we establish an emission standard, it says that no matter which car you buy, your emissions should never be worse than a certain figure." The equivalent has not occurred in voting systems, he said. Killam notes that the performance testing approach leaves makers free to produce cars that suit their market.

Asked why the equivalent hasn’t occurred, he described the question as tough to answer. As far as I know, he said, the idea of including human performance testing in a standard has never been done before. "So there’s a very steep learning curve for a lot of the people involved, even those who are stakeholders in the process." Statistical validity and statistical reliability are being investigated, he explained, “and we are looking at repeatability across the various firms that would be doing the testing." Killam observed that working with human subjects is not a trivial matter, and establishing the test will take time. 

One of the other problems has been priority – that "it’s very difficult to know how exactly significant [voter intent] issues are in any given election." They take on much more significance when the election is close, according to Killam. "It’s not the kind of thing that you can ever say for certain had an effect – or how significant the effect has been in the past. It’s the kind of thing that people can’t wrap their hands around and deal with directly. There’s almost a level of faith needed. … We are talking about making improvements, in some cases, that users will compensate for even if we don’t do it."

He described users as their own worst enemies to improving systems "because they do figure things out." As hard evidence is lacking, he said, nobody can say with certainty how bad the problem is. Most people who get involved in the attempt to improve the system are passionate about it "because they know it is the right thing to do."

He pointed out that the idea of cost-justifying human factors and usability has been debated for years, but it’s obviously not simple. In consequence, he said, "we have established a procedure where we have a sample ballot, which we make sure was moderate in its complexity. … We specify certain things to do and not do on the ballot, but basically, they are completing a full ballot," even though completing an entire ballot is a very uncommon occurrence in real elections. "And so the question will remain, well if that’s the case, how bad will this machine perform in the real world?” As improvement is the goal, he said, it doesn’t mater how it relates directly – and we don’t believe there is any way to measure that. "We do believe, however, know that if we can prove the machine’s adequate performance in this test, it will work better in the real world."

Paper vs. Machine

As his explanation seemed to suggest that ballot layout and comprehensibility are key issues, Killam was asked if it makes much difference if the voting is done with a paper ballot or a machine.

"We have seen enough data from the testing to know that all of the different systems will have some level of performance issues," he replied. The question is where does it come from and how significant is it. Paper has no immediate feedback towards your actions. The feedback doesn’t occur – in some cases at all – if they have central counting. So if you cast a ballot that is not run through a scanner at the local polling place, then the voter basically has no feedback opportunity. If they marked the ballot incorrectly, if they double-voted a race, if they left part of a race blank, then that is not checked."

The sense is that you get immediate feedback with electronic systems, he said, but we’re finding that feedback is also insufficient. "So it’s a different problem, and it’s not being solved there very well either."

Then you have differences in the populations who have experience with the equipment versus those who don’t. To promote the simplicity of a particular machine, he pointed out, the maker might liken it to an ATM. The comparison doesn’t hold water, according to Killam. “Voting has too many differences to ATM use to make a casual comparison like that."

He pointed out that the comparison with ATMs also falls down when it comes to choice. A user who is confounded by one bank’s ATM can change to a bank with a simpler ATM system. A voter struggling with a complicated voting machine at a polling station doesn’t have the option of changing to another machine or station.

Relative experience is also a factor. Killam notes that not everyone uses ATMs, and even those with solid ATM experience might use a voting machine only once in four years. "It’s not the same thing. We are talking about ease of learning, ease of use and ease of recall. And they don’t often correlate with each other."

He sees some effort by election authorities to address the problems. The National Institute of Standards and Technology (NIST) has already produced some better guidelines, including guidlines on ballot layout [and] guidelines on instructions." He noted that they are voluntary; that it is left it to individual states to make the guidelines mandatory if they choose. NIST is also the one supporting his firm’s efforts to develop test standards, he added.

He notes that "an awful lot of attention" has been paid to issues like voting security. "I’m not unbiased," he added, "but it would be nice if there was more focus in this [test standards] area and better, research-based guidelines. I think there is a lot of research that is just frankly missing."

The problem with the current voting system guidelines is that they are untested, he said. "We don’t …  have any idea that these are the correct guidelines, and that they are sufficient." On the other hand, "the test standard is an environment where these things could actually be tested. And once we have a performance baseline, you could interject a design change and note the effect – if it improves the situation, makes it worse, or has no effect to validate any design guidelines."

He sees the approach as designer friendly. [If] "you give them a hurdle and say, ‘you have to figure out how to get over this,’ they will."

He is not convinced the design guidelines are even necessary, but notes they are what everybody relied on in the past. "I think it would be far easier to rely on performance standards," he explained, and by establishing those, force the vendors to become creative. And he is concerned about the vendors having both untested guidelines and performance standards. Killam says it is like telling them to be creative, “then tying their hands with guidelines rather than allowing them to explore on their own.”

He advocates a two-step process, first testing potential guidelines, then establishing those that are necessary as standards rather than guidelines. Killam describes establishing the standards as requirements as complicated. "A requirement has to be worded very differently, and there’s way too many variables to account for most design areas."

For the system to work properly, he said, it has to be decided whether the vendors will be told what to build, or they will be allowed to build what they want – provided it works correctly. Performance testing fits comfortably with the latter alternative.

He used a recent explanation from a vendor to illustrate his point: "If the government has a system they want us to build and they want to write a full specification," the vendor said, "then we can build it. But if not, allow us to build a system we know will work and you can test to make sure it does. But you can’t straddle and be half way."

As an example of the first approach, he said a South American country asked the country’s universities to design a voting system. The winning design was established as the standard, and multiple vendors then built to that standard.

Complications of Free Competition

Though the standardization of the machines across voting systems is feasible in principle, according to Killam, it is unlikely to work in the United States because the principle of free competition prevails. He argues that the free market nature of the system fits well with the performance standard approach. "Make this performance a requirement and make people meet it. We do it in miles per gallon. We do it in emissions."

Asked why many other countries are relatively free of the complications to improving the voting system, he pointed out that they don’t have the same situation as the United States. "The complexity of the US ballot has always been stated as far different than any other country. … We have all of the states running independent elections within the states and then, in certain occasions, rolling up some of their information to the federal level." And the variability is states’ handling of elections is significant. For instance, he explained, no one goes to a polling station to vote in Oregon. It is all done absentee ballot. Some localities in some states require full face ballots and some don’t. Some states require a straight party ticket option, some do not. The difficulties of mandating anything across such variables would be formidable.

Killam foresees an interesting November 2008 election because "in a sense we know what the problems could be. I think everybody will be looking for them. And any time you look for problems, you are certainly more apt to find them than if you are not looking for them. So if the claim is that it’s gotten worse, I don’t think so. We’ve just gotten much better at detecting the problems." They should realize that there was noise in the system, but  hopefully it was not great enough to affect the outcome." He defined "noise" as anything in the design of the machine or ballot that interferes with voter intent.

The beauty of the performance standard, he said, is that it is ballot neutral, equipment neutral and vendor neutral.

Source: Bill Killam; HFES; Sun-Sentinal; New York Times

This article originally appeared in The Ergonomics Report™ on 2008-10-15.