Items in Performance Testing

Multiple-choice item writers probably spend at least 75 percent of their time and effort crafting plausible lies.

Think about it: a multiple choice test with choices a, b, c, and d consists of three distractors. I figure that the stem and correct answer count as a single entity in terms of effort because item writers probably know both when writing an item. Thus, item writers probably spend more than 75 percent of their time on the fictions, i.e., distractors. Inconveniently, a distractor is sometimes a second correct answer unbeknownst to the item writer. Ouch. What if the unintended second correct answer is known and used more widely among highly-skilled practitioners? Double ouch.

Multiple choice has become so ingrained as a testing method that people seldom consider how odd it is to invest almost all of a subject matter expert's time writing fiction.

In the interest of lighting a candle rather than cursing the darkness, let me share how my team and I approach performance testing items in our performance tests.

Items consist of the following:

1. A description of an unambiguous end state
2. One or more mechanisms by which the end state may be evaluated
3. A mechanism for creating the end state

(1) is what we present to a candidate. Want to test process and not end state? Easy. Process is just a succession of end states.

(2) tells us whether the end state was achieved or not. BEWARE OF THIS PIECE. It is easy to fall into the trap of evaluating what I call "process artifacts" rather than the end state itself. For example, one might write a mechanism that parses a configuration file that configures file sharing to determine if a file share is available. What if there are other ways to provide that file share that do not require the file in question? What if there are factors besides a correct configuration file that affect whether the share is available?

It is essential that an evaluation mechanism (2) evaluates the end state described in (1) rather than process artifacts that might (or might not) create the end state. Your candidates might know alternative approaches that your item writers do not. If you wish to ensure that a particular method is used, then that judgment must be embedded in (1) by framing the end state in such a way that the particular method is the only possible solution.

The purpose of (3) is to create the condition given by (1) in order to provide a sanity check of the testing environment. For example, a test might include a number networking services items (e.g., web, email, etc.) Our (2) mechanisms should evaluate these across a network in order to provide fidelity. Consequently, we must ensure that the networking environment is fully functional and would not prevent us from accurately evaluating a candidate's work. We use a "known-good" mechanism to create the end state and then to evaluate using our mechanisms from (2). If the known-good mechanism did not produce the end state, something requires correction in the environment. Evaluating end state rather than process artifacts minimizes the possibility of false positives and false negatives when we vet the environment.

I hope this brief overview illustrates how item writing in the performance-based world differs from multiple choice. It requires that you trust and guide your SMEs, just as multiple choice item writing does. The difference, in my view, is that it is easier for a SME to answer, "How would you check if that end state exists?" than, "What are three answers to that question that sound correct but really are not?"
April 16 2009


Post Comment

  For security reason enter the characters shown in the picture below.
The characters are not case-sensitive.

security img
*Word in image