Item Review old

This chapter explores the critical process of item testing in assessment development, covering key stages from initial creation to evaluating the quality of items. It examines methods for evaluating item quality, including content validation, psychometric analysis, accessibility, and bias review. It emphasizes the importance of empirical data and expert opinion in refining and selecting effective assessment items.

The primary purpose of item review in assessment is to ensure the quality, validity, and fairness of each test item you have included in an assessment. A wise guideline for writing is not to fall in love with your first draft. When you first create a test item, you may think it's perfect. However, never underestimate the propensity of those taking your tests to misunderstand or be confused by the items you write. Conducting an item review will help you find flaws in your test items so you can improve them or remove them. 

Item review is an essential aspect of creating a test. This critical process serves multiple functions: it verifies that each question aligns with the intended learning objectives and content areas, checks for technical accuracy and clarity in wording, and identifies any potential biases that could unfairly impact certain groups of test-takers. Item review also helps maintain the overall integrity of the assessment by ensuring an appropriate level of difficulty, the relevance of the content being tested, and the constructs targeted by the items ini the assessment. This process provides an opportunity to refine and improve questions based on feedback and data, ultimately enhancing the assessment's ability to accurately measure learners' knowledge and skills.

Each step in the item review process is designed to ensure that test items (questions) are valid, reliable, and fair and that the overall assessment aligns with its intended purpose. This process involves evaluating different aspects of the assessment. Here is an overview of the steps in the process:  

STEP 1: Item creation is the initial stage of this process, where subject matter and assessment experts develop questions or tasks for the assessment. These items are crafted to align with specific learning objectives or competencies being measured.  They may create various types of questions depending on what best assesses the target knowledge or skills. The goal is to generate a pool of potential items that accurately reflect the content domain and assessment purpose.

Step 2: The initial review is a preliminary quality check of the newly created items (or previously created items if an item review was not previously conducted). During this step, reviewers examine each item for clarity, accuracy, and alignment with the intended learning objectives. They look for any obvious errors in wording, formatting, or content. They consider the appropriate difficulty level, cognitive demands, and format for each item to ensure they follow the guidelines for that particular item type. The review also considers whether the question can be answered and whether there is only one correct or best answer. This stage often involves making minor edits to improve item clarity or correct small mistakes. The goal is to refine the items and ensure they meet basic quality standards before moving on to more rigorous evaluation stages. 

In this step, we do an initial bias and sensitivity review to establish fairness and inclusivity of individual test tems. Carefully examine each item for potential biases related to culture, gender, ethnicity, socioeconomic status, or other demographic factors. Look for language or content that might unfairly advantage or disadvantage certain groups of test-takers. The review also identifies any potentially offensive, insensitive, or stereotypical content that could cause distress or distraction. Accessibility is another key consideration, with reviewers checking that items are comprehensible and answerable by individuals with various disabilities or special needs. Items flagged during this process may be revised to remove biased elements or eliminated entirely if they cannot be improved. This step is essential for creating assessments that are equitable and valid for all intended test-takers, regardless of their background or characteristics.

Step 3: Assessment validation is a crucial step in the assessment review process that applies to the overall test, not just an individual item. The table of specifications can help ensure the assessment items accurately represent the intended content domain and cognitive level (i.e., Bloom's levels). During this stage, subject matter experts thoroughly examine each item for relevance, appropriateness, and alignment with the specified learning objectives or competencies outlined in the test plan (or blueprint). The experts assess whether the items collectively cover the full range of knowledge and skills being measured without gaps or overemphasis on targeted areas. An item may be revised to improve item quality or removed if the item does not meet expectations. This step is vital for establishing evidence of content and construct validity of the assessment. By this we mean providing evidence that the items are a fair and representative sample of the targeted content domain and that they measure the intended cognitive level (i.e., not just recall).

Step 4: Field testing and psychometric analysis are interrelated steps that focus on evaluating assessment items' statistical properties and real-world performance. In this step, we need to gather empirical data from a representative sample of the target population. During this phase, test-takers' responses are collected and analyzed, providing concrete data on item performance. Field testing may involve asking those who answered the questions to reflect on what they were thinking as they took the test (i.e., a cognitive think-aloud interview). Conducting a cognitive interview helps identify any unexpected issues with item wording or formatting that may not have been apparent in earlier reviews. After several individuals have taken the test, empirical data about each item is used to flag potential problems. These item statistics include the difficulty level, discrimination index, and distractor analysis in multiple-choice formats. Statistical analysis may also employ item response theory models to predict how items will function across different ability levels. In this chapter, we focus on classical item statistics as psychometric analysis using advanced item response theory models is beyond the intended scope of this class. You should take an advanced measures course to learn more about item response theory models. 

The combination of theoretical psychometric analysis and practical field testing ensures that items are not only statistically sound but also function effectively in actual testing conditions, contributing to the overall reliability and validity of the assessment. In the next chapter, we discuss how to capture and analyze item statistics.

Chapter Summary

  • Context-dependent test items are designed to evaluate higher-order thinking skills by requiring learners to interact

Discussion Questions

  1. How can 

This content is provided to you freely by EdTech Books.

Access it online or download it at https://edtechbooks.org/Assessment_Basics/guiding_instruction.