User Testing

Test Session Basics

Before you start testing, a few decisions you need to make include:

Moderated vs. Unmoderated - Moderated sessions allow for a back and forth discussion between the participant and facilitator. Facilitators can ask questions for clarification or dive into issues during or after the user completes tasks. The participant completes unmoderated usability sessions with no interaction from a facilitator. They are asked to explore using the product independently and report back. 

As a general rule of thumb, moderated testing is more costly (i.e., facilities, time, and setup) but allows the facilitator to get detailed responses and understand the reasoning behind user behavior. Unmoderated testing is less expensive and is more authentic. However, unmoderated user sessions can provide superficial or incomplete feedback. The facilitator may need to conduct a detailed interview or have the user complete a survey once they have finished testing the product.

Remote vs. In-person – Remote testing is typically unmoderated and, as the name suggests, is done outside a structured laboratory setting in the participant’s home or workplace. Remote unmoderated testing doesn’t go as deep into a participant’s reasoning, but it allows many people to be tested in different areas using fewer resources. In-person testing is usually done in a lab setting and is typically moderated. However, an unmoderated session can be conducted in a lab setting. The evaluator may record or observe the user interacting with the product in an unmoderated session, but they analyze body language, facial expression, behavior without interacting with the user.

Gorilla Testing – is testing in the wild. Instead of recruiting a specific targeted audience, participants are approached in public places and asked to perform a quick usability test. The sessions should last no more than 10 to 15 minutes and cover only a few tasks. It is best to do gorilla testing in the early stages of the product development—when you have a tangible design (wireframes or lo-fi prototypes) and what to know whether you’re moving in the right direction. This method is beneficial for gathering quick feedback to validate assumptions, identify core usability issues, and gauge interest in the product.

Lab testing – The term laboratory may be misunderstood when describing a setting in which products are tested. Indeed, participants may be invited to a location where specialized apparatus or materials will be used (e.g., eye tracking equipment), but whenever you invite someone to test a product in an environment of your choosing, it might be considered a laboratory test. A lab setting is testing done in unique environments under specific conditions and supervised by a moderator. In contrast, field studies are defined as observations of users in their own environment as they perform their own tasks. Any time you test in a controlled setting, you run the risk of getting skewed results to some extent. Lab testing is essential; however, you will also need to test in a more authentic setting once the product is ready to implement.

Testing in a Lab vs. Field Studies Example

When testing the design of a new asynchronous online course, designers conducted several remote unmoderated evaluations of the product with a diverse group of participants from the target population. Users testing the product were given access to the course and asked to work through the material and give their impressions. One aspect of the design included external links to supplemental information. Under laboratory conditions, those testing this feature of the course indicated they loved the opportunity to search and review these optional materials. Some of the reviewers reported spending hours working through the elective content. However, summative evaluation results conducted once the product was implemented revealed that students enrolled in the course never used this feature, not once. Students working in an uncontrolled authentic setting determined that accessing this information had no impact on their grades; as a result, they didn’t. So while user testing under laboratory conditions confirmed the potential benefits of external links, testing in the classroom exposed this as an unrealized potential (i.e., a theory-to-practice issue). You cannot always control for all the confounding variables that affect actual use. (source Davies, 1999)

A few testing methods you might consider include:

Expert Evaluation (usability heuristics analysis) - Expert Evaluation (or heuristic evaluation) is different from a typical usability study in that those providing data are not typical users. Experts evaluate a product’s interface against established criteria and judge its compliance with recognized usability principles (the heuristics). Heuristic analysis is a process where experts use rules of thumb to measure the usability of a product’s design. Expert evaluation helps design teams enhance product usability early in the design and development process. Depending on the instructional product, different design principles will apply. Identifying appropriate heuristic principles can be the focus of a theory-based evaluation. (video, steps, example of website heuristics)

A/B testing - A/B testing (or A/B split testing) refers to an experimental process where people are shown two or more versions of something and asked to decide which is best. A refers to the ‘control’ or the original design. And B refers to the ‘variation’ or a new version of the design. An A/B split test takes half of your participants and presents them with version A and presents version B to the other half. You then collect data to see which works best. A/B testing is often used to optimize website performance or improve how users experience the product. (see primer, steps)

Card Sorting - Card sorting is a technique that involves asking users to organize information into logical groups. Users are given a series of labeled cards and asked to sort them into groups that they think are appropriate. It is used to figure out the best way to organize information. Often the designer is has a biased view of the organization based on their experience. Card sorting exercises can help designers figure out an organization scheme that best matches users’ mental model of potential users rather than what the designer thinks is most logical. This can also be used to organize the scope and sequence of instructional content and is an excellent method for prioritizing content. Card sorting is great for optimizing a product’s information architecture before building a prototype, lo-fi mockup, or wireframe. (see examples)

Cognitive Think-aloud Interviews – this technique goes by different names (e.g., context inquiries), but the basic technique asks test participants to perform a number of tasks while explaining what they are doing and why. This is an unmoderated testing approach where the evaluator tries to capture what users think as they perform the task without intervention. The evaluator does not interact with the user; they record the user’s actions, their explanations, and note any problems. Several publically available resources exist that cover this topic (see Intro).

Cooperative evaluation is a moderated variant of a think-aloud interview. In addition to getting the user to think aloud, the evaluator can ask the user to elaborate or consider “What if ?” situations; likewise, the user is encouraged to provide suggestions and actively criticize the product’s design. Think-aloud interviews can provide useful insights into the issues a user might have with a product. However, the value of the information provided depends on the task chosen and how well the person conducts the interview. 

Before you begin, you will also need to consider the following:

Creating Scenarios

A scenario is a very short story describing a user’s need for specific information or a desire to complete a specific task. There are various types of scenarios you might create, depending on the purpose of your test. You can also ask users for their own scenarios then watch and listen as they accomplish the task. A scenario should represent a realistic and typical task the product was designed to accomplish. The facilitator should encourage users to interact with the interface on their own without guidance. Scenarios should not include any information about how to accomplish a task or give away the answer. Several publically available resources describe this process. Several publically available resources exist that cover this topic (see video explanation, resource1, resource2).

Moderator guidelines

An essential aspect of any moderated user test is the person facilitating the evaluation. An inexperienced moderator may inadvertently thwart the interview process. This can be done by failing to establish rapport, asking leading questions, failing to probe sufficiently, and neglecting to observe carefully. Usability testing can yield valuable insights, but user testing requires carefully crafted task scenarios and questions.

A few basic rules for interacting with evaluation participants include:

An excellent resource on this topic is provided by Molich et al. (2020) [alt link]. Several additional free resources that describe this process are available online. ( see video explanation, common mistakes)

When User Testing Fails

When deciding on which educational psychology textbook to use in a course, the instructor decided to ask several students to give their opinion. He provided them with three options and asked which would be best. This was an unmoderated remote evaluation of the textbooks using a simple A/B testing option. The student tended to agree on one textbook. When asked why, students indicated they liked the design and colors on the front of the book. Aesthetics are important—but the unmoderated format and lack of a carefully created guiding scenario resulted in a failed evaluation. The usability of the textbook should have been determined using a set of scenarios devised to evaluate the usefulness and efficiency of the design and not just the appeal. A more thorough evaluation might also have included an expert review of the content (i.e., correctness) and the design principles used.  

Session overview

A typical usability test session should not last too long (less than an hour) and might include the following:

Triangulation

One last thing to remember is to trust but verify. Not everything the user says will be accurate or reasonable, and opinions about how to proceed can be diverse. Use multiple sources and look at the problem from multiple points of view. Combine multiple types of data and obtain information using several methods. Recommendations should be reasonable, ethical, plausible, and for the most part, required. Remember, not all changes can or should be done (even if deemed necessary), and not all nonessential changes should be ignored if they improve the product and are reasonable.

Chapter Summary

  • Formative evaluation is typically conducted in the design phase. 
  • User Testing is a fundamental aspect of formative evaluation. 
  • By User Testing, we mean having the intended end-users test the product’s design to determine how users interact with the product.
  • Both UX testing and Usability testing focus on human-centered design principles and the idea that products are designed for people to use.
  • The ISO defines usability in terms of effectiveness, efficiency, and satisfaction. 
  • Formative evaluation should begin early in the design and development process.
  • Typical users and subject matter experts should be used to evaluate the product.
  • Personas can be developed to describe the typical intended users of a product.
  • Formative evaluation test groups need not be large (Rule of 5).
  • Qualitative sampling should be used to identify key informants.
  • User testing can be moderated or unmoderated, remote or in-person, conducted in a laboratory setting or as a field study.
  • Various types of testing can be employed, including expert evaluations (heuristic analysis), A/B testing, card sorting, and cognitive interview (context inquiries).
  • The value of the information obtained from a user test depends on the task scenario used and how well the moderator conducts the interview. 
  • Triangulation is needed to verify data and fully understand issues.
  • Recommendation for modifying a product should be reasonable, ethical, plausible, and for the most part, required.

Discussion Questions

  1. Consider a product you would like to evaluate. Describe the best way to test the product’s usability in terms of conducting a moderated vs. unmoderated, remote vs. in-person, and laboratory vs. field study. What would you recommend and why?
  2. Consider an educational product you are familiar with. Describe a persona (a user group) that typically would use this product.  

References

Davies, R. (1999). Evaluation Comparison of Online and Classroom Instruction. Higher Colleges of Technology Journal. 4(1), 33-46.

Molich, R., Wilson, C., Barnum, C. M., Cooley, D., Krug, S., LaRoche, C., ... & Traynor, B. (2020). How Professionals Moderate Usability Tests. Journal of Usability Studies, 15(4).

This content is provided to you freely by EdTech Books.

Access it online or download it at https://edtechbooks.org/eval_and_design/user_testing.