Item Creation

Creating test items

There are many resources available to guide the creation of test items. Still, it's easy to write a lousy test. The quality of your assessment will depend on the quality of the items you use. It is crucial to use the most appropriate type of test item to capture the expected learning outcomes of your instruction. You should also test and revise the items you create. Best practices suggest writing multiple items to measure the same outcome. This allows you to weed out faulty items. You may also choose to use similar items for equivalent forms of an assessment or a bank of test questions.

This chapter is meant to help guide your creation (or selection) of test items. It includes subsections with guidelines for specific types of test items such as (a) binary test items (also called alternative response items), like yes/no or true/false questions; (b) multiple choice questions; (c) matching items; (d) constructed response test items (also called supply type items); and (e) context-dependent test items (also called interpretive exercises) which use a set of items from any of the previous formats. Although each of these test items has specific guidelines to follow, there are also some general guidelines that all test items should follow. These are presented below.

But first, you should make sure that using traditional test items provides appropriate measurements of the expected learning outcomes for your learners. If you're not sure whether a traditional test is the right test to measure your learning outcomes, you should review our chapter on Test Planning.

General Guidelines to Follow

When creating traditional test items, there are many common mistakes that you should seek to avoid. These are pitfalls that can affect the creation of any kind of test item. We share a checklist of 10 things you should avoid in your assessments. To make this list easier to use, it is broken up into three distinct categories: (1) Item type selection,  (2) Using Clear Language, (3) Creating Fairness, and (4) Ensuring Usability.

Item type Selection

A basic guideline that applies to all the item types is to ensure you have selected the best type of item given the learning objective. Ask yourself if this is the most appropriate item type to use given what I am attempting to measure.  if not, use a different item type. Sometimes, we select an item type for the wrong reason. For instance, it is common to use a multiple-choice item simply because it is easy to administer and score. However, there are some learning objectives that cannot be measured appropriately with this type of item. Often, this applies to the level of Bloom's Taxonomy you are attempting to measure, but it can also apply when the skill being measured is a performance. 

Using Clear Language

The language we use for our test instructions, questions, and responses can easily invalidate our test results. If the test is too difficult to understand, then it becomes a test of students' reading comprehension as opposed to a test of their content-specific knowledge or understanding. The following four guidelines can help you make sure that the language used in your test aids students in demonstrating their knowledge or ability.

  1. Avoid Ambiguous or Unclear Language: Clear and precise wording is crucial to ensure that learners understand what is being asked of them. Ambiguity can lead to confusion and misinterpretation, resulting in inaccurate assessment of students' knowledge or skills - invalidating your test results.

  2. Avoid Lack of Clarity in Answer Choices: Provide clear and distinct answer choices in multiple-choice and matching questions. Ambiguous or overlapping options can confuse students and make it difficult for them to select the correct answer. Each option should be mutually exclusive, representing a single, clear response. This advice also applies to the creating binary test questions that might sometimes be true or sometimes be false.

  3. Avoid Lack of Clarity in Answer Selection: Learners should be expected to clearly identify the correct answer for each test item. In binary test items and/or multiple choice test items, this means having learners write the correct answer as opposed to circling the answer. Circling the answer can result in poor circle placement, wherein learners try to get the benefit of a doubt by partially circling two responses or providing two separate circles for two separate responses. In the case of test items like matching items, have learners write out their answers instead of drawing lines to connect options in one column to options in another column. Ask learners to write the capital letter that corresponds with their choice. For multiple choice and matching items this would look like writing "A," "B," "C", or "D." It is better to use capital letters because learners can get very creative with how well they can make the lower case letters "a," "b," or "d" resemble each other. For binary choice items, it is better to have learners write out the whole word for their choice, e.g. "true," "false," "yes," or "no." It is better to have students write out the full word because the lower case and upper case letters for "t" ("T") and "f" ("F") can be made to resemble one another if all they write is a single letter. Letter confusion is not as big of an issue with yes/no binary items, but more clarity is always better. Because learners should write the whole word for binary test items, the various uses of capitalization is not as important.

  4. Avoid Lengthy or Complex Sentence Structures: Keep the wording of the test items concise and straightforward. Long, convoluted sentences can confuse students and distract them from the intended assessment purpose, once again making the test more about reading ability than content understanding. Use clear and concise language to ensure that students can easily comprehend and respond to the questions.

  5. Avoid Overuse of Negative Language: Minimize the use of negatively worded questions or items. Negative phrasing can confuse students and increase their cognitive load, making it harder for them to select the correct response. Some students might also feel pressure to complete tests quickly. This pressure can cause them to skip over words like "not" in the middle of a question. This once again makes the test more about reading comprehension than about content-specific knowledge and understanding. Whenever possible, rephrase items in a positive or neutral manner.

Creating Fairness

As previously discussed in our chapter on Quality Assessments, fairness is essential to making sure that your assessments produce valid and reliable results. The following three guidelines focus on ways that you can make your traditional assessments fair for all students.

  1. Avoid Bias or Stereotyping: Ensure that test items are free from bias or stereotypes that may disadvantage certain groups of students. Avoid using language or examples that may favor or discriminate against particular genders, cultures, experiences, or backgrounds. The assessment should be fair and equitable for all students - providing equal opportunities for any learner who had access to your instruction to demonstrate their learning.

  2. Avoid Irrelevant or Outdated Content: Ensure that the test items are aligned with the current curriculum and reflect the essential content and skills that students are expected to learn as a result of your instruction. Avoid including outdated or irrelevant information that does not accurately assess students' understanding of the subject matter. Avoid using tests that are not adequately aligned to the learning outcomes and standards for the course.

  3. Avoid Inconsistent or Inappropriate Scoring: Ensure that the scoring guidelines, procedures, and/or rubrics align with the intended learning outcomes and accurately assess students' demonstration of learning. Inconsistent or inappropriate scoring criteria can lead to unfair grading and misinterpretation of students' abilities. This misrepresentation would invalidate your test results.

Ensuring Usability

Usability was also previously discussed in our chapter on Quality Assessments. Usability is what allows our assessments to have valuable results. Many of the educators we have worked with voice frustration with assessments that don't seem to inform or appropriately reflect student learning - and rightfully so! Creating and implementing assessments is incredibly time intensive. We need to make sure that it is time well used. The following three guidelines provide ways to maximize your traditional assessment's usability.

  1. Avoid Overemphasis on Memorization: Avoid creating test items that solely focus on rote memorization of facts or details. Instead, design items that assess students' comprehension, application, analysis, and evaluation of concepts to promote higher-order thinking skills and a deeper understanding of the material. To assist with this process, you should create a Table of Specifications as part of your Test Plan.

  2. Avoid Limited Assessment of Topics: Avoid assessing only a narrow range of knowledge. Traditional assessments should be designed to evaluate a variety of cognitive abilities, including comprehension, application, analysis, synthesis, and evaluation across and between appropriate topics. Assessing a broader range of topics provides a more comprehensive picture of students' capabilities. Again, you should create a Table of Specifications as part of your Test Plan to assist with this process.

  3. Avoid Insufficient Context or Real-World Application: This suggestion has applications for both test creation and post-test instruction. During test creation, ensure that the test items provide sufficient context or real-world scenarios that allow students to apply their knowledge and skills. Avoid isolated or abstract questions that do not reflect the practical application of the subject matter. Context-rich items promote deeper understanding and transfer of learning. Additionally, upon test completion, the data gathered from assessments should be used to direct instruction and guide future learning processes. It is essential that learners understand that assessments are FOR learning and not merely OF learning.

Conclusion

We understand that these lists seem like a lot to avoid. We recommend chunking the process of creating your assessments. There are five steps you can follow to ensure the creation of quality test items. First, make sure that you are using the right type of assessment and following the Test Plan for your learning objectives. Second, make sure that your test is following guidelines for test validity. Third, create test items according to your Test Plan, following the guidelines above for test usability and the guidelines in the following subchapters for specific item types. Fourth, have your learners and other professionals provide feedback on test items and directions, checking for clear language and fairness. Lastly, revise test items as needed, altering or removing those items which seem problematic. Upon implementing your test for the first (or even second, third, or tenth!) time, it may be helpful to review your test items for their difficulty or ability to discriminate between those who performed well on the test and those who did not. This will further inform the changes that need to be made to specific test items. We discuss methods for reviewing test items in the next chapter, Item Review.

The subchapters that follow this chapter provide specific guidelines for creating various types of traditional test items.

Chapter Summary

  • It is important to use the test items that are most appropriate for measuring the expected learning outcomes of the assessment.
  • Educators will need to write and revise multiple test items to ensure that they use the items that best measure expected learning outcomes.
  • Item Difficulty is the percentage of test-takers who answer an item correctly, providing insight into the item's challenge.
  • Discriminating Index is a measure that shows how well an item distinguishes between higher and lower performers on a test.
  • General principles such as using clear language, creating fair tests, and ensuring assessment usability are essential to follow for all test items to make sure each item fosters the creation of a quality assessment.
  • Clear wording in test items is advised to avoid ambiguity and confusion, ensuring that students' responses accurately reflect their knowledge and skills, as opposed to their reading comprehension or ability to guess what their instructor is looking for.
  • Test items need to be free from bias and relevant to the current curriculum to ensure fairness and accuracy in measuring students' knowledge.
  • Using items that measure real-world applications of knowledge and higher-order thinking skills can lead to more reliable and usable assessments.
  • Specific guidelines for creating various types of test items, such as binary, multiple choice, and matching items, are detailed in the subchapters that follow this chapter.

Discussion Questions

  1. How can ambiguous language in test items affect the assessment's validity and what strategies can be used to avoid such pitfalls?
  2. Discuss the impact of fairness in test item design. How can bias in item creation be identified and mitigated?
  3. Describe the role of usability in assessment design. What are the consequences of poor usability in test items, and how can it be improved?
  4. Considering the general guidelines for item creation, choose one type of test item (e.g., Multiple Choice or Essay) and consider how you have experienced the consequences of not following these guidelines. How did the test item affect your performance? How did your instructor react if they learned that the test item was ineffective?

Binary Test ItemsMultiple Choice ItemsMatching ItemsConstructed Response & Essay ItemsContext Dependent Items

This content is provided to you freely by EdTech Books.

Access it online or download it at https://edtechbooks.org/Assessment_Basics/item_creation.