CUE-1 – Are usability tests reproducible?

The purpose of the first CUE-study was to get an idea whether usability tests are reproducible. CUE-1 and later CUE-studies showed that they are not.

I also wanted to get an idea whether the statement “Five users are enough to find 75% of all usability problems in a given product” is true. CUE-1 and later CUE-studies showed that it isn’t.

CUE-1 was a comparative usability test of a Windows calendar program (Task Timer for Windows, version 2) performed by four professional teams in the spring 1998. The results were published at the UPA98 conference in Washington, DC, in June 1998.

Overview of all CUE-studies

Practitioner’s Take Away

Three results of this comparative test particularly surprised us:

the limited overlap between usability problems in the software reported by the different teams;
the large number of usability problems detected in the software; and
the reproducibility of the results produced by SUMI, the Software Usability Measurement Inventory questionnaire that two of the teams used.

The limited overlap may be a result of the large number of usability problems in Task Timer for Windows. It could also be due to the different approaches to usability testing that the participating teams took – in particular, the selection of different usability test scenarios.

Available Downloads

The UPA98 paper “Comparative Evaluation of Usability Tests”
(12 pages, PDF, 140 KB).
The CUE-1 rules and all four usability test reports and addendums in one file (PDF, 1.256 KB).