Constructed Response & Essay Items

Introduction

Constructed Response assessment items, such as fill-in-the blank, short answer, and essay questions, serve as a cornerstone in educational assessment, offering dynamic means for learners to exhibit their comprehension, critical thinking, and creativity. These assessment items task learners with crafting original responses to questions or tasks based on their underlying knowledge, and foster a more authentic assessment experience when compared to traditional assessment items such as binary, multiple-choice, or matching items. Constructed response items also provide an opportunity to measure upper levels of Bloom's Taxonomy, encouraging learners to synthesize and analyze information, evaluate concepts, and compose or "create" thoughtful and original responses.

Criteria and Definitions

A defining characteristic of constructed response items lies in their compositional requirement. Unlike their selected-response counterparts, which necessitate the identification of a predetermined answer from a list of options, constructed response items prompt learners to independently generate responses. Such responses can range from a single word, phrase, or term to a single sentence, paragraph, or multi-paragraph essay. Constructed response items that extend beyond a single term or sentence require learners to articulate their thoughts in a coherent, organized manner and often include some form of reflection or rationale. These response items are also characterized by their subjective nature. Many constructed response items may have multiple correct interpretations, answers, or approaches, and therefore require evaluation from a content area expert for correct assessment. This contrasts with traditional test items that have agreed upon correct responses and can easily be graded by anyone (or anything - AI, algorithms, or a learning management system) with an accurate answer key.

Types of Constructed Response Items

Constructed response items include various formats tailored to suit diverse learner and assessment needs. Fill-in-the-Blank items require learners to produce a single word, term, or phrase to complete an incomplete statement. Short answer items demand concise responses, often restricted to a single sentence. On the other hand, essay items necessitate more extensive responses, spanning multiple sentences or paragraphs and delving into nuanced analysis, evaluation, or reflection. Despite the response-length variety that exists between different types of CR items, each item requires clear instructions regarding expected response length and point allocation. Such instructions and point allocation provide learners with indications of how much time or effort they are expected to spend on an item relative to other items on the same assessment. For example, fill-in-the-blank items might be worth one point on a test with the expectation that learners can progress through a set of these items fairly quickly. However, an essay item might be worth 15 points and educators might expect learners to spend several minutes crafting a response.

Learning to effectively create and employ diverse CR formats allows you to cater to different learning objectives and allows you to gauge learners' proficiency across a spectrum of knowledge and skills.

Affordances and Constraints

While constructed response items offer opportunities for assessing higher-order thinking skills and promoting authentic learning and assessment experiences, they necessitate thoughtful design and evaluation to meet their full potential.

Affordances/Advantages

Constructed response items offer numerous affordances for both you and your learners. From a pedagogical standpoint, they align seamlessly with standards emphasizing critical or creative thinking without overlooking the application of knowledge. These questions also provide a platform for learners to hone their writing abilities, promoting clarity, coherence, and persuasiveness in communication. Additionally, by prompting learners to construct their responses, these items foster deep engagement with course material and facilitate the development of higher-order thinking skills.

Essay questions, in particular, are well-suited for assessing learning outcomes that transcend rote memorization or foundational understanding, such as analytical thinking, synthesis of information, and original expression. Whether analyzing historical events, critiquing literary works, or proposing solutions to complex problems, essay questions empower learners to demonstrate their understanding in multifaceted ways. As such, they epitomize the shift towards more holistic and authentic forms of assessment in education.

Additionally, the creation of constructed response items can be much less time-intensive than creating selected-response test items. Instead of forming a stem and four plausible content-related options as is required by multiple choice items, or forming a list of premises and responses as required by matching items, constructed response items can stop at the stem or premise. Consider the following premises from Table 3 of the previous chapter on matching items (Example 1). To convert these premises from matching items to constructed response items, you would only need to provide less complex directions and forego the creation of responses (Example 2). You also wouldn't have to worry about all of those best practices for formatting matching items!

Example 1. Directions, Premises, and Formatting for a Matching Item

Directions: In the blank for each description in the first column, write the capital letter of the correct option in the second column. Each option in the second column will only be used once, and each description only has one correct answer.

PremisesResponses

1. leader of the United Kingdom during World War II

2. leader of the Soviet Union during World War II

3. leader of France during World War II

4. leader of the United States during World War II

5. leader of Italy during World War II

6. leader of Germany during World War II

A. Winston Churchill

B. Arthur Fadden

C. Charles de Gaulle

D. Adolf Hitler

E. William Lyon Mackenzie King

F. Benito Mussolini

G. Hubert Pierlot

H. Joseph Stalin

I. Harry S. Truman

Example 2. Directions and Formatting for Constructed Response Items

Directions: Please write the answer to the following questions in the space provided.

1. Who was the leader of the United Kingdom during World War II? (1940-1945)

2. Who was the leader of the Soviet Union during World War II?

3. Who was the leader of the Free French forces during World War II?

Many educators may find that the time saved from creating constructed response items as opposed to other traditional assessment items is enough benefit to default to using this type of test item on their assessments over more complex text items. However, constructed response items are not without their challenges.

Disadvantages/Constraints

The open-ended nature of constructed response items can lead to wide variability in learner responses, complicating the grading process and potentially introducing subjectivity or bias into measurements that rely on constructed response items. Also, while nearly all learning contexts can benefit from employing constructed response items, some learners may need alternative assessment methods to provide answers to such items. For example, learners with low levels of literacy or certain physical constraints may need to record audio/video responses as opposed to creating written responses.

Speaking of written responses, short answer and essay items may inadvertently favor learners with strong writing skills, overshadowing their mastery of underlying content. Learners may also be tempted to "write until right" when completing these items. This constructed response item strategy relies on writing so much in a response that the pure volume of the response will lead to a higher score through either (a) confounding you - tricking you into thinking that the response is bloated with information and detail, or (b) hoping that the response eventually says something that you are looking for in a "good" or "correct" response. Constructing effective short answer and essay items demands careful consideration of task complexity, clarity of instructions, and alignment with learning objectives.

The other disadvantage of using constructed response test items is that they frequently require more of your time and energy for reliable scoring and feedback. While traditional multiple choice or matching test items can benefit from a learning management system providing automatic scoring and general feedback, learning technologies have not been able to provide such scoring and feedback for constructed response test items - though we are getting closer through the use of generative artificial intelligence and natural language processing models like ChatGPT.

Usability and Guidelines

To benefit from the advantages of CR items you should adhere to specific guidelines for their usability and implementation.

Usability

Constructed response items provide a platform for learners to demonstrate their understanding and application of knowledge in real-world contexts, fostering a deeper and more meaningful learning experience. You should consider the ways in which your constructed response items can present learners with real-world, authentic applications of their knowledge or abilities. Such authenticity provides learners with opportunities to understand the value of course content beyond doing well on a test. Examples of authentic assessment include assessments based on real-world scenarios or assessments with public impact. In the case of constructed response test items, you can create authenticity by presenting learners with real-world problems to solve. Requiring learners to articulate their learning in written or audio/video forms can also serve as an opportunity to assess and develop learners' communication abilities, abilities that transcend content knowledge or ability by applying to various real-world scenarios.

In addition to considering how constructed response items can provide authenticity, you should also consider how they can prompt learners to engage in higher-order thinking skills such as analysis, synthesis, evaluation, and creation. While fill-in-the-blank and short answer test items may focus more on recall or understanding, essay items can provide learners with opportunities for communicating content-specific evaluations and rationales based on critical analysis. For example, while various test items can measure learners' abilities in a chemistry class to understand or recall the steps of the scientific method, an essay item could allow learners to describe and defend an experiment that correctly follows the scientific method. Such a description cannot be accomplished with selected-response assessment items.

Guidelines

As with all assessment items there are general guidelines (as seen in the Item Creation chapter) that you should follow in creating constructed response items. In addition to these general guidelines, you need to consider the following ideas related to writing effective constructed response items.

  1. Clarity and Precision: Prompts for constructed response items should be clear and concise to ensure that learners understand what is expected of them. If learners have to guess at what a good response should include or look like, then the results of your item cannot be used to make valid inferences concerning learners' abilities or understanding. We provide more details on how to ensure clarity and precision in the "Structuring and Defining Tasks" section below.
  2. Alignment with Learning Objectives: Constructed response items should align closely with course objectives and target specific learning outcomes. Whether it's analyzing text, solving problems in mathematics, or evaluating arguments, another subject matter expert should be able to discern what your constructed response item aims to measure. Creating constructed response items can seem easy because they require fewer elements than selected-response items; however, we should only use constructed response items if they are the best method for measuring our intended learning objective, and subsequently providing learners with feedback concerning their progress on that objective. Barring the practice of communication or writing skills, there is no reason to have learners write an essay (or for you to evaluate and grade such an essay) if 10 selected-response test items or five short answer test items will provide similarly valid measurements.
  3. Task Complexity: Educators should carefully consider the complexity of the constructed response item task, ensuring that it appropriately challenges learners while remaining achievable within the allotted time frame. Most assessments are bound by some kind of time limit - a class period, a training session, etc. You must make sure that learners are given enough time to respond to each constructed response item on a test. Learners need more time to complete constructed response items than for responding to selected-response items.
  4. Scoring Criteria: Clear and transparent scoring criteria should be provided to learners, outlining the expectations for quality responses and facilitating consistent grading practices. Additionally, you should consider providing learners with an example of an acceptable response when appropriate. This doesn't mean you need to show them what a correct response looks like (though more on that idea below), but you should specify how long you expect their response to be, how much supporting information they should include, whether they need to cite or reference specific resources, and whether writing ability or grammar will play into the overall scoring. This information should be provided to learners as part of the way you structure and describe your constructed response items.

Structuring and Defining Tasks

Effective constructed response items require careful structuring and precise task definition to elicit meaningful learner responses. Prompts should be focused on a specific aspect of knowledge or skill, avoiding overly broad or vague questions that may confuse learners or lead to unfocused responses. Clear instructions, guidelines, and/or examples should be provided to ensure that learners understand what is being asked of them and how their responses will be evaluated. Creating clear instructions for fill-in-the-blank and short answer questions is generally pretty simple. Example 3 displays some approaches to writing such instructions.

Example 3. Instructions for Fill-in-the-Blank and Short Answer Items

Providing clear instructions for essay questions can be more complex, and may differ based on the type of essay question you are using. Generally speaking, essay items fall into one of two categories: (1) Restricted Response or (2) Extended Response. Employing both restricted and extended response items in an assessment can provide a more comprehensive picture of learning and skill development.

Restricted Response v. Extended Response

Restricted Response Items elicit direct answers from learners. They are designed to assess specific knowledge and typically restrict both the content and the form of the learner's response to produce an expected correct answer. For instance, asking learners to "List two advantages of using grounded electrical wiring systems in single-family homes" requires them to provide specific information within a limited framework, ensuring that responses are straightforward and measurable within a short time frame, such as four minutes.

Extended Response Items allow for more detailed and expansive answers. These items provide fewer limits over the content of responses, thus providing learners with the opportunity to provide less predictable responses. For example, an extended response question might ask learners to "Compare the writings of Bret Harte and Mark Twain in terms of setting, depth of characterization, and dialogue styles of their main characters," which requires a comprehensive analysis and can lead to a variety of acceptable responses, thereby assessing a learner's critical thinking and synthesis skills.

When defining the tasks for essay items, it's important to ensure that prompts are neither too wordy nor vague. Clarity and focus are essential to guide learners towards meeting expected outcomes. For instance, shifting from a less focused question like "Evaluate the impact of the Industrial Revolution on England" to a more focused one such as "Evaluate the impact of the Industrial Revolution on the role of fathers in poor communities of England" can significantly enhance the specificity and relevance of learner responses.

Additionally, it can be helpful to provide learners with an explanation of the scoring process for an essay prior to having them complete the test item. Clearly state whether the items will be assessed analytically or holistically. You may even want to help prepare learners for the essay by providing them with an analytic scoring guide or examples of answers that would holistically earn a top score, a high score, and an unacceptable score. Other guidance that you should provide to learners is detailed in the next section. Additional examples of effective essay prompts can be seen at the end of the chapter.

Defining Learner Responses

Criteria for defining learner responses to constructed response items include response length, response complexity, time per response, and scoring expectations. Because short answer questions should extend beyond mere regurgitation of facts to demonstrating critical understanding, we might tell learners that their responses should be no more than two sentences, as we are trying to measure if they can concisely represent such understanding. For essay items, we might tell learners that responses should have three paragraphs, explaining that we expect them to present or introduce an answer, provide support for their answer, and then explain the implications of their answer.

It is also important to provide learners with guidelines as to how much time they should spend answering each constructed response item. This could mean giving learners a flat amount to spend on specific items or prescribing what percentage of the testing time they should spend on specific items. For example, we might specify that they only have five minutes per essay question or suggest that they spend half of their testing time answering essay questions. You will only know if you give learners an appropriate amount of time if you take the time yourself to answer the questions. If creating a response to an essay question takes you longer than expected, remember that it will likely take even longer for your learners. In such a case, you should adjust the expected requirements of the question to better align with the time available for testing. Providing learners with accurate pacing guidelines can reinforce how complex or detailed their responses should be.

Similarly, we should provide clear examples of what constitutes a high-quality response for constructed response items when we begin preparing our learners for their assessment. We should answer questions like:

Demonstrating the difference between "A" and "B" and "C" answers through sample questions can guide learners in understanding the expectations and standards of the assessment, thereby improving their performance and enhancing the overall effectiveness of your essay items. Extended response essay questions may encourage learners to demonstrate originality and creativity in their responses, allowing for diverse perspectives and interpretations of the prompt, as is the divergent nature of such items. Restricted essay prompts expect learners' responses to be more convergent in that responses should resemble an acceptable correct answer. Lastly, we should voice our expectation for essay responses to demonstrate systematic and coherent thinking, with clear organization and logical progression of ideas and supporting details.

Lastly, you should decide on the importance of mechanics, grammar, and spelling early on in creating your constructed response items. While these elements are essential for clear communication, their weight (or importance) in the overall score should vary depending on the subject matter and learning objectives of the assessment.

Assessing Essay Responses

Assessing essay items demands a nuanced approach, requiring evaluators to be competent and unbiased. Competence in this context refers to having a deep understanding of the content area. The assessor must be able to accurately judge the correctness and quality of learners' responses. Assessors must also strive to maintain objectivity, avoiding bias to ensure that each response is evaluated fairly based on its merits and not other responses (unless of course we are using norm-referenced grading).

Choosing a Scoring Method

Two primary methods are employed in assessing essays: (1) Analytic Scoring and(2) Holistic Scoring.

Analytic Scoring involves breaking down the essay into its fundamental components. This method requires the creation of a rubric or scoring table (Example 4) that specifies the elements considered essential for a quality response. Each component of the essay, such as argument clarity, evidence, and structure, is assessed independently and assigned a specific score. This method is beneficial because it provides detailed feedback to students and allows for more consistent and objective grading across different assessors. It can also allow essay items to measure multiple learning objectives by providing separate measurements for knowledge and skills. For example, one criterion could focus on providing relevant support while another focuses on grammar and mechanics.

Example 4. An Example of an Analytic Scoring Table

Looking at the table above, what could be some problems with this approach to scoring? What are the different possible scores a student could earn based on their performance in each category?

Holistic Scoring, in contrast, assesses the essay response as a whole. Rather than dissecting the essay into essential components, the scorer evaluates the overall quality of the work. Assessors using this method will often rank essay responses by categorizing them into different performance categories such as excellent, good, average, and poor. This method is particularly useful for open-ended essay questions that can elicit a variety of acceptable responses, focusing on the general impression the essay makes rather than its specific parts. This approach is also useful for norm-referenced grading, in which we want to rank learners based on their performance.

Preventing Bias When Assessing Essays

Holistic scoring, while useful for capturing the overall effectiveness of an essay, may suffer from lower reliability due to its subjective nature. Therefore, analytic scoring often provides a more reliable and valid assessment, as it is based on specific, observable, and measurable criteria. However, both analytic and holistic scoring can suffer from assessment bias. There are several strategies that we can apply to both methods to limit bias.

One such strategy is to grade all essay responses in one sitting. This strategy helps maintain consistency in grading standards throughout the assessment process. Similarly, we recommend grading all of the responses for one item before moving on to the next item as opposed to grading all of one learner's responses before moving on to the next learner. This method also helps reduce bias by limiting the effect that a learner's response on one item might have on your assessment of a later item. For example, if a learner provides a near perfect response to the first of three essay items, an assessor might fallaciously assume that they will perform that well on all three essay items. The opposite may also happen, wherein an assessor assumes a student who does poorly on the first essay item might perform poorly on all three. Grading each response as a separate event helps prevent this kind of bias.

Another strategy to limit bias is to score essays anonymously, ideally by shuffling papers and using a numbering system rather than names. Some learning management systems may also allow for this approach to grading. If your system doesn't, then we recommend grading each essay response in a random order rather than starting alphabetically. This could mean grading the first essay response starting with M- last names, then grading the second essay response starting with B- last names, and grading the final essay response starting with T- last names. This jumping around can help prevent assessment fatigue from affecting learners' scores by altering when an individual learner's response is scored relative to other responses.

As previously stated, providing learners with a scoring key or examples of responses for various performance levels can help them understand the expectations for each item. Similarly, creating an example response for various performance levels can also help you prevent bias. Tentative scoring keys and example responses will help you understand what your expectations are for each response. While these expectations may need adjustment as actual responses are reviewed, providing yourself with this baseline can help you make sure a performance standard is established prior to scoring the first learner's response.

REVIEW: Examples of Ineffective and Effective Essay Prompts

Ineffective essay items often suffer from overly complex prompts, lack of clarity, or broad scope, which may result in superficial or unfocused responses. Effective essay items, on the other hand, are focused and specific, with clear instructions and alignment with learning objectives, fostering meaningful assessment and promoting learner growth.

In constructing essay questions for assessments, you need to be mindful of crafting prompts that are clear, focused, and aligned with learning objectives. Here are some examples of ineffective and effective essay items, highlighting common pitfalls and best practices.

Ineffective Essay Items

  1. Too Broad and Ambiguous:

    • Ineffective: "Describe the history of the papacy from its origins to the present day, focusing on its impact on various continents."
      • Why it's ineffective: This question is overly broad, covering too much content for a single essay, making it difficult for students to provide a thorough and focused response.
  2. Complex and Confusing:

    • Ineffective: "Evaluate the emotional stability, degree of adjustment, and repressed frustrations of historical figures based on their literary works."
      • Why it's ineffective: This prompt combines multiple complex psychological assessments with historical analysis, which can confuse students about the primary focus and expectations.
  3. Lacks Specificity:

    • Ineffective: "Discuss the effects of World War II."
      • Why it's ineffective: This question is vague and lacks specificity, allowing for an unfocused response that could go in numerous directions without deeply exploring any relevant knowledge or understandings.

Effective Essay Items

  1. Focused and Specific:

    • Effective: "Evaluate the impact of the Industrial Revolution on child labor practices in Victorian England."
      • Why it's effective: This question is specific, directing students to focus on a particular aspect (child labor practices) during a specific period and location (Victorian England), which encourages a detailed and targeted response.
  2. Promotes Critical Thinking:

    • Effective: "Analyze the role of checks and balances in maintaining democratic principles in modern American government, providing real-world examples."
      • Why it's effective: This prompt requires critical thinking and understanding of the government structure, pushing students to apply theoretical knowledge to contemporary situations.
  3. Clear and Directed:

    • Effective: "Compare and contrast the main economic policies of Presidents Ronald Reagan and Barack Obama, discussing their long-term impacts on the U.S. economy."
      • Why it's effective: The question clearly outlines the task (compare and contrast), the subject (economic policies of two presidents), and the expected outcome (discussion of long-term impacts), providing a clear direction for students.

By using these principles, you can design essay questions that not only challenge your learners but also guide them towards constructing meaningful, insightful, and well-organized responses. This approach not only enhances the quality of learning but also improves the accuracy and effectiveness of your assessment.

REVIEW: Effective Essay Instructions

Essay instructions should include an effective prompt and provide information as detailed in the "Structuring and Defining Tasks" and "Defining Learner Responses" sections above. In short, effective essay instructions should include:

  1. A clear prompt.
  2. A description of expected length of the response and/or the amount of time that should be spent on the response.
  3. A description of what high-scoring responses will contain, e.g., the number of references, details, or examples.
  4. A description of how the essay will be graded.

The example below is for an essay question in a course entitled "Modern Views and Values." It from a unit entitled "Sport or Religion: Exploring the Cultural Impacts of Professional Sports" and focuses on the learning outcome that "students will be able to use their understanding of dynamic relationships to make a claim or advance a thesis using evidence and argument."

Example 5. Effective Essay Item Instructions

In the space provided, please write three paragraphs that address the following prompt. You should spend approximately 10 minutes outlining and writing this essay. The ideal response will (1) introduce your response to the prompt, (2) provide 3 pieces of evidence to support that response, and (3) provide a conclusion based on the evidence.

Here is the prompt:

Explain which has a greater cultural impact: the National Football League in the United States of America or Premier League Football in the United Kingdom.

Your response will be evaluated on the clarity of your argument (5 points), evidence used to compare and contrast each football league (7 points), and the connection between your evidence and conclusion (5 points). Please use standard English grammar and mechanics, as it will also make up a small portion of your grade (3 points). In total, this essay is worth 20 points, making up 25% of the total points on this exam.

Chapter Summary

  • Constructed response and essay items are effective for measuring upper levels of Bloom's Taxonomy, including analysis, evaluation, and creation, and are particularly useful for assessing learners' ability to articulate their thoughts in a structured and coherent manner.
  • Constructed response items range from short answers requiring a few words or sentences to complex essays demanding in-depth discussion and argumentation.
  • To create effective constructed response items, educators need to provide clear, detailed instructions and expectations regarding the response format, length, and the criteria for grading.
  • While these items provide a rich assessment of learner understanding, they can be challenging to grade due to their subjective nature and often require more time for evaluation.
  • Providing clear, constructive feedback and being transparent about grading criteria are crucial for fairness in assessing longer constructed response and essay test items.
  • The Analytic Scoring method breaks down essays into components for detailed and objective feedback.
  • The Holistic Scoring method evaluates the overall quality of an essay, which can be faster but more subjective.
  • Strategies like anonymous grading and using detailed rubrics help minimize subjective bias. Additionally, regular calibration or norming sessions ensure consistent grading standards among assessors.

Discussion Questions

  1. What are some strategies educators can use to effectively balance the cognitive load on learners when responding to constructed response items?
  2. How can constructed response items be designed to minimize the subjectivity in grading and ensure more consistent and fair assessments?
  3. Consider the advantages and potential drawbacks of using essay questions to assess complex cognitive skills such as analysis and synthesis in learners. How might these advantages and disadvantages affect educators use of these items?
  4. How can constructed response and essay items be utilized in a way that not only assesses learners' knowledge but also aids in the development of their writing and argumentation skills?

This content is provided to you freely by EdTech Books.

Access it online or download it at https://edtechbooks.org/Assessment_Basics/constructed_response.