CUE-9 – The evaluator effect

The purpose of CUE-9 was to investigate the evaluator effect or Rashomon effect, which names the observation that usability evaluators who analyze the same usability test sessions often identify substantially different sets of usability problems.

CUE-9 assembled experienced usability professionals to discuss the state-of-the-art in usability evaluation based on a common experience in evaluating the website for the US moving company U-Haul. Each professional team:

Watched five 30-minute videos from usability test sessions,
Wrote a short, anonymous report about their findings,
Submitted their report,
Read similar reports written by other experienced professionals,
Met experienced colleagues at the CUE-9 workshop where they compared and discussed findings, and learned from the similarities and differences.

Overview of all CUE-studies

Practitioner’s Take Away

Have more than one evaluator independently analyse test sessions, at least in important evaluations. With more than one evaluator, more problems are detected and evaluators get an opportunity to reflect on their agreements and disagreements.
Consult people with local or domain knowledge to avoid uncertainty in the analysis of user actions. Local and domain knowledge may be needed to interpret whether users approach tasks appropriately, miss important information, and reach correct task solutions. The goal of a test should be clarified ahead of the test.
Consolidate the severity ratings of the reported usability issues in a group process. Such a process is likely to reduce the number of highly rated problems and thereby adds focus to redesign work. A group process may also support problem prioritisation, by providing the usability specialists with development people who are knowledgeable about the ease or difficulty of fixing the problems.
Consider the use of unmoderated tests. On the basis of this study, unmoderated tests appear to be a costeffective alternative or supplement to moderated tests as the evaluator effect and the number of identified usability issues were similar for moderated and unmoderated tests.
Remember that perfect reliability is not required in order for usability testing to be worthwhile. This is particularly relevant when multiple usability tests are conducted in an iterative process of evaluation and redesign, thereby providing additional possibilities for finding usability problems that are initially missed.

Paper and Article About CUE-9

What you get is what you see: revisiting the evaluator effect in usability tests,
by Morten Hertzum, Rolf Molich and Niels Ebbe Jacobsen
Behaviour & Information Technology, April 2013

For a copy of this paper, please contact me.

A little known factor that could have a big effect on your next usability test
by David Travis, retrieved 19 March 2020.

Available Downloads

CUE-9 Overview (2 pages, PDF, 27 KB)
CUE-9 Detailed Description (7 pages, PDF, 46 KB)
CUE-9 Participants (1 page, PDF, 11 KB)
CUE-9 Reports, 19 reports from first workshop (ZIP, 33 MB)
CUE-9 Reports, 16 reports from second workshop (ZIP, 34 MB)

Workshops

CUE-9 workshops were held on 20 June 2011 in Atlanta, GA, USA, and on 11 September 2011 in Chemnitz, Germany. Thirty-five leading usability professionals participated in CUE-9, including Steve Krug, Chauncey Wilson, Carol Barnum, Tom Tullis and Nigel Bevan. The Atlanta workshop attracted 19 participants while the Chemnitz workshop attracted 16.