The A to D of writing multiple choice tests

By Julia Sandford-Cooke

Multiple choice tests are hard to get right. And I’m not just thinking of the time I scored 19% in a school physics test – statistically less than if I’d just guessed every answer. It’s actually really tricky to write high-quality questions and answer options that genuinely assess knowledge and understanding. As with a lot of the topics discussed on this blog, it’s a type of writing and editing that seems easy until you try it.

What do I mean by multiple choice (or multi-choice) questions and answers? They’re the ones with a standalone question (the stem) where the correct answer (the key) is hidden among three or four wrong answers (distractors). The people responding (let’s call them students) have to choose one or more answers from the options given. For example:

What noise does a cat make?

      1. Woof
      2. Moo
      3. Meow [key]
      4. Baa

And what do I know about multiple choice questions? Well, quite a bit. I have edited hundreds, maybe thousands, of them for one of the UK’s biggest test providers over the past 15 years. I’ve also written and edited them for, well, multiple other contexts, including textbooks, revision guides, workbooks and online learning materials.

A good multi-choice test is an objective measurement of a student’s knowledge, which can be taken and marked online, with instant feedback. However, from my experience, authors usually don’t know what a good () – or bad ()– multi-choice test looks like. They might be experts in their subject but they’ve never been taught how to actually write a test. And there’s a lot they should know, involving some pretty complex pedagogical concepts. I don’t have space to go into Bloom’s Taxonomy here but the goal is to ensure that the test is an unobtrusive channel for assessing the student’s knowledge.

So here’s a quick primer, covering four common problems.

Problem A: The question doesn’t make sense

The question must be pitched appropriately for who is taking the test. Unless it’s a Key Stage 2 SATs test, the aim is to find out what students know, not how well they can read or understand long words. Clarity is vital. The wording of question and answers should be concise and unambiguous, assessing knowledge, not literacy skills. There is usually no need to fill the question with irrelevant and confusing information:

Pet cats may be kept inside or outside, or be able to move freely between the house and garden. Sometimes neighbouring cats can enter the house in this way but owners can allow only their cat to come in by installing a special cat flap. How?

What type of cat flap prevents the wrong cats from entering the house?

Students shouldn’t have to waste time under exam conditions trying to work out what they are being asked. The question should be self-contained so that it makes sense without the answers.

My cat Pixel is:

      1. tortoiseshell.
      2. black and white. [key]
      3. ginger.
      4. tabby.

What colour is my cat Pixel?

      1. Tortoiseshell
      2. Black and white [key]
      3. Ginger
      4. Tabby

Avoid colloquialisms and unnecessarily complex language. Of course, you might want to find out whether students know a particular technical term, but the structure of the question should make that intention clear and direct.

A cat is a digitigrade. What does this mean?

      1. It has a different number of toes on its front and back paws.
      2. It walks on its toes. [key]
      3. It stands with its toes flat on the ground.
      4. It has claws.

Technical terms applied in the wrong context might also make for credible distractors.

Opinions differ on negatively phrased questions. Some people argue that they’re confusing, while others say they make students read the question more carefully. I think they’re fine under the right circumstances, and as long as the negative word (eg ‘not’) is obvious (eg formatted
in bold).

Problem B: The distractors are too obvious

I see this issue more than any other. The author knows what they want the students to know but struggles to think of plausible distractors.

What is the common name for the species felis catus?

      1. Cat
      2. Dog
      3. Elephant
      4. Human

If the correct answer can be easily guessed without any background knowledge, the question has failed in its purpose. And a test isn’t the time to try to be funny.

If it’s too hard to think of wrong answers, perhaps it’s the wrong question. Try asking it in a way that allows the distractors to be worth considering. They could be frequent misconceptions, commonly asked questions, otherwise true statements or other related terms or concepts that the student might know. For example:

What is the Latin term for the domestic cat?

      1. Felidae [Latin term for the family ‘cat’]
      2. Felis catus [key]
      3. Panthera [the genus of cats that roar]
      4. Felis silvestris [European wild cat]

All the answer options should have a similar sentence structure that follows on logically from the question. It’s the same principle as wording bullet lists to follow platform sentences – errors may unintentionally draw attention to the wrong (or right) answers.

Cats are crepuscular because they:

      1. they like to knead your laps with their paws.
      2. of their rough tongues.
      3. like to go out at dawn and dusk. [key]
      4. prefers to go out during the day.

Option lengths should be consistent – often, the correct answer is obvious because it is much longer or shorter than the distractors, and phrased slightly differently.

Where does Pixel most like to be stroked?

      1. On his back
      2. Around his face, ears, chin and at the base of his tail, where his scent glands are [key]
      3. On his tummy
      4. On his paws

Pixel deep in thought during a maths test

Avoid ‘All of the above’ – it’s a copout. Students only need to realise that more than one answer could be right to reasonably guess that ‘All of the above’ is the correct answer.

What is a cat’s favourite pastime?

      1. Sleeping
      2. Being stroked
      3. Sitting on laps
      4. All of the above.

With this example, you could also argue that ‘favourite’ implies a single pastime that the cat enjoys more than any other. ‘All of the above’, therefore, is doubly confusing.

‘None of the above’ is also a meaningless option, as it does not identify whether the student knows the correct answer.

On a related note, avoid acronym questions. Not only could a student successfully argue that a collection of letters stands for anything you want it to, but it’s also hard to write realistic distractors for a specific acronym.

What does RSPCA stand for?

      1. Really Special People’s Cats Association
      2. Royal Society for the Protection of Cats and Animals
      3. Royal Society for the Prevention of Cruelty to Animals
      4. Running Short of Possible Cat Answers

If the test isn’t delivered via software that randomises the position of the answers each time it’s administered, vary the placement of the key throughout the test, to avoid any patterns.

Problem C: The questions and/or answers are ambiguous

This is the opposite problem to the obvious distractors. A student may find that more than one option could be correct, but a multi-choice test doesn’t give the opportunity for students to answer ‘it depends’.

What noise does a cat make?

      1. Woof
      2. Moo
      3. Meow [key?]
      4. Purr [key?]

Authors are sometimes advised to ask students to find the ‘best’ answer rather than the ‘correct’ answer but this rather skates over the need for precise wording. In this case, it would be better to ask a more specific question that tests a higher level of understanding:

What noise do cats make to communicate with humans?

      1. Woof
      2. Moo
      3. Meow [key]
      4. Purr

Don’t ask ‘What would you do?’, as the student could easily defend any answer with ‘Well, I would do that!’. Similarly, avoid anything that could be seen as subjective or absolute:

Why are cats so cute?
Why do cats love fish?
Why does Pixel only come into my office when I’m in a Zoom meeting?

But it’s also important not to be too specific. Avoid closed questions – they limit the distractors:

Are whiskers a type of hair?

      1. Yes
      2. No
      3. Sometimes
      4. Meaningless fourth distractor

Problem D: The test isn’t tested

It’s not always possible to try out the questions before using them, but they should at least be run past a colleague. You might know what you mean but other people might not.

As with any edited text, develop a style guide that encompasses any aspects that could be inconsistent – the use of numbers, units and punctuation, for example.

Remember to provide students with clear instructions on how you expect them to take the test. Ensure they know what learning objectives, topics or concepts are being tested, and whether they can refer to notes or use aids such as a calculator.

Tests that are to be administered live (as opposed to being used as self-revision in a textbook) should be kept on a spreadsheet that states clearly when and how the questions have been used.

If possible, keep anonymised data on how students answered each question. There’s quite a bit of analytical science relating to this but, for general tests, all that’s really important is to ask the following:

  • Were there any distractors that nobody chose?
  • Were there any answers that everyone got right?
  • Can variations in students’ results be explained by their different levels of knowledge alone?

Learn from the data and revisit the test to change elements as necessary. Consider, too, whether a multi-choice test format is suitable for assessing everything that needs to be assessed. A bit like this blog post, some topics lend themselves to longer, more evaluative responses, and can’t be properly examined within the constraints of a few options.

But, done right, are multiple choice tests effective tools for assessing learning, useful revision aids and direct channels for measuring knowledge? Well, yes – all of the above …

Julia Sandford-CookeJulia Sandford-Cooke of WordFire Communications has more than 20 years’ experience of publishing and marketing. When she’s not hanging out with other editors (virtually or otherwise), she writes and edits textbooks, proofreads anything that’s put in front of her and posts short, often grumpy, book reviews on her blog, Ju’s Reviews.

 


Photo credits: multiple cats – The Lucky Neko; hand and paw – Humberto Arellano; whiskers – Kevin Knezic, all on Unsplash

Proofread by Alice McBrearty, Entry-Level Member.
Posted by Abi Saffrey, CIEP blog coordinator.

The views expressed here do not necessarily reflect those of the CIEP.

2 thoughts on “The A to D of writing multiple choice tests

  1. Joanie Eppinga

    This is great. I too have been writing and editing multiple choice tests for years, and you’ve articulated some things I’ve been aware of but hadn’t quite put into words. I’ve grown proud of my ability to write good test questions. People don’t understand how hard it is to come up with answers that seem reasonable but could not plausibly be argued as being correct.

    Reply

Leave a Reply

Your email address will not be published.