19 — Usability Testing: Watching Users Struggle

The Cheapest Insurance You Can Buy

Usability testing is one of the highest-return activities in product. For an investment of one or two days, you find out what would have taken weeks or months to discover after launch. Real users sit down with your design or prototype, try to do something, and you watch where they get stuck. The places they get stuck are the places that were going to fail when you shipped.

Despite the obvious value, most teams don't do usability testing, or they do it badly. They believe they understand users well enough that they can skip it. They are usually wrong. The mismatch between what designers and PMs imagine users will do and what users actually do is much bigger than any team estimates from the inside.

This article is about how to run useful usability tests with small budgets and small samples. It is the most practical research method we know, and once you build the habit, you will not ship anything important without testing it first.

The Five-User Rule

One of the most useful findings in usability research came from Jakob Nielsen in the 1990s: testing with five users finds about eighty-five percent of the usability problems in a design. After five, you start hearing the same problems again, and the marginal value of each new test drops fast.

The math is simple. If a problem is severe enough to affect thirty percent of users, the chance that one of five randomly selected users will hit it is over eighty percent. Two of five is over sixty percent. The same problem will surface in test after test once you start watching. You don't need a hundred users to know there's a problem; you need three or four to all hit the same wall. The implication is that usability testing is dramatically cheaper than most teams think. Five users is one afternoon. Five users is a Slack message asking for volunteers. Five users is no excuse not to. Yet teams still skip it because they imagine usability testing requires a research team, a lab, recording equipment, and a month of preparation. None of that is required.

There are exceptions. If your product has very different user types (consumers vs developers, beginners vs experts), you may need five of each type. If the test surfaces issues that make you redesign, run another five after the redesign. But the basic principle holds: small samples, run repeatedly, produce most of the value.

What Usability Testing Tells You

Usability testing reveals one specific kind of insight: whether real users can complete real tasks with your design. It doesn't tell you whether users want the feature, whether they would pay for it, or whether the feature solves a real problem. Those are different questions answered by different research methods.

What usability testing tells you specifically:

Where users get stuck. The exact step in the flow where confusion or hesitation appears.
What users misunderstand. Labels they read wrong, icons they don't recognise, instructions they skip.
What users expect. What they assume will happen when they click something, which is often different from what actually happens.
What users miss. Features that are present but invisible, options users never see because they're in the wrong place.
How long things take. Whether a flow that is "fast" in your head is actually fast in practice.
Where errors happen. What users try that the system doesn't handle gracefully.

These findings are concrete, fixable, and almost always surprising. The PM who has been staring at the design for weeks no longer sees it the way a fresh user does. The user sees what is actually there. That gap is what usability testing closes.

How to Run a Test

A usability test is simpler than most teams imagine. The basic structure is: give the user a task, watch them try to do it, ask follow-up questions when they stop. That's it. The skill is in the details, not the structure.

Step One: Pick the Tasks

Decide three to five things you want to watch users try to do. These should be real tasks, not feature inspections. Sign up for an account and create your first project is a task. Try out the new dashboard is not. Tasks have a goal the user is trying to reach; feature inspections are aimless wandering.

Step Two: Recruit Five Users

Find users who roughly match your target audience. Internal colleagues are convenient but biased; they know the product. External users are better, even if they're friends-of-friends. If you're testing a B2B product, recruiting can take a week; if you're testing a consumer product, you can often find people in a coffee shop. Don't over-engineer the recruiting; imperfect matches are still useful.

Step Three: Set Up the Session

Thirty to forty-five minutes per session. Either in person or video call. The user uses your design or prototype while you watch. Record the session if you can (with permission), so you can review later or share with the team.

Step Four: Brief the User

Tell the user three things at the start. First, you are testing the design, not them; if they get stuck, that is the design's problem, not theirs. Second, please think out loud: tell you what they're looking at, what they expect to happen, what is confusing. Third, you cannot help them; if they get stuck, you will note it but not rescue them, because the moment of stuckness is the data.

Step Five: Give Tasks One at a Time

Read the first task aloud or write it on a card. Then watch. Don't intervene. Don't hint. Don't explain when they get confused. Just observe and take notes. When they finish (or give up), ask one or two follow-up questions, then give the next task.

Step Six: Take Notes on Specific Things

What did they hesitate on? What did they say out loud that showed confusion? What did they click that didn't lead where they thought it would? What did they try that doesn't exist? Where did they smile or frown? Quotes are gold; capture exact wording when something interesting comes out.

What Not to Do During the Test

The hardest part of usability testing is staying quiet. The instinct to help is strong. The user is struggling. You can see what they need to do. You want to point. Don't. Every time you intervene, you contaminate the data. The next user won't have you helping them when they hit the same wall.

Don't Lead

You probably want to click that button there. Now you have learned nothing about whether the user would have found the button on their own. Bite your tongue.

Don't Explain

Oh, that part of the design just means X. Now you have trained the user out of their initial misunderstanding. Their first reaction was the data. Once you explain, you have lost it.

Don't Defend

Well, the reason we did it this way is... If you find yourself defending the design, stop. The user is telling you something is unclear. Defending it doesn't make it less unclear. Listen and note.

Don't Lead Their Feedback

What did you think of the new layout, isn't it cleaner? telegraphs the answer you want. Replace with How would you describe the layout in your own words? The user's words are the data.

Don't Ask About Hypotheticals

If we added a feature that did X, would you use it? produces the same useless data as in any survey. Stick to what they actually did with the actual design in front of them.

Reading What You See

Watching users is easy. Interpreting what you watch is harder. A few rules help.

Watch Behaviour, Not Just Words

Users often say one thing and do another. They might tell you the design is clear while clicking the wrong thing. Their behaviour is more reliable than their words. Note both, but weight the behaviour more heavily.

Pay Attention to Hesitation

When a user pauses, even for a second, that pause means something. They are processing, deciding, doubting. Pauses are data, even when the user eventually does the right thing. A design that requires every user to pause and think at step three is a design with a problem at step three, even if they all get past it.

Note What They Don't See

If you watch carefully, you will see users look right past elements that you are sure they would notice. Buttons in the wrong location. Tooltips. Help text. Notifications. The feature is technically present, but it might as well not be. These invisibility problems are some of the most common findings in usability testing.

Look for Patterns Across Users

One user struggling with X is a data point. Three users struggling with X in similar ways is a problem. The pattern is what matters; individual reactions are noisy. Five sessions is usually enough to see clear patterns.

Severity Matters

Not all usability problems are equally serious. Some make users abandon the task entirely (severe). Some slow users down but don't stop them (moderate). Some are minor annoyances (mild). Rate the severity of each issue you find so the team can prioritise the fixes.

Different Forms of Usability Testing

There are several variants, each useful in different situations.

Moderated In-Person

You and the user are in the same room. Highest-fidelity observation, easiest to read body language, but hardest to schedule. Best for important decisions early in design.

Moderated Remote

You and the user are on a video call. The user shares their screen. Almost as good as in-person and dramatically easier to schedule. The default for most modern usability testing.

Unmoderated Remote

Tools like UserTesting or Maze let you set up tasks, send them to users, and review recordings later. No scheduling, fast turnaround, but you can't ask follow-up questions in the moment. Best for testing specific design questions where you don't need open-ended exploration.

Hallway Testing

Grab a colleague who hasn't seen the design and ask them to try a task. Five minutes. Crude but useful for catching obvious problems before more formal testing. Great for early iteration on a flow.

Click Testing and Heat Maps

Tools that show where users click on a page or design. Useful for measuring at scale where formal moderated testing would be too expensive. Doesn't tell you why users clicked where they did, but tells you the pattern.

Method Best For Cost Moderated in-person High-stakes early-design decisions High

Moderated remote Most product testing Medium

Unmoderated remote Quick design questions, big samples Low

Hallway testing Iterating quickly on early designs Very low

Click testing / heat maps Measuring patterns at scale Low

When to Test

Usability testing is most valuable at certain points in the design and build process. Knowing when to test helps you spend the time well.

Before Design Is Final

The earlier you test, the cheaper the changes. Test on wireframes or low-fidelity prototypes before writing any code. Issues found here cost minutes to fix. Issues found after the code is written cost days. Issues found after launch cost weeks plus reputational damage.

After Major Design Changes

Anytime the design changes significantly, test again. The previous testing told you about the previous design, not the new one. New designs introduce new usability issues; small changes can have big effects.

Before Launch

Even when the design seems polished, run one final round of usability tests on the near-final product. You will almost always find something. Better to find it now than from support tickets.

Periodically After Launch

Mature products develop usability debt. Features added over years interact in ways no one designed for. Periodic usability testing on mature products often reveals issues that have been quietly costing conversion or retention for months.

Common Mistakes

Mistake One: Testing With Internal Users Only

Internal users know the product. They know what the labels mean. They know what the buttons do. They are useless as usability test subjects because they have no fresh perspective. Always test with external users when the decision matters.

Mistake Two: Treating Tests as Validation

If you go in hoping users will love the design, you will see what you want to see. Tests are diagnostics. They look for problems. Reframe your goal: I want to find what's broken before users see it.

Mistake Three: Discounting Findings as Edge Cases

Well, that user is unusual. Sometimes true. Often a rationalisation. If three of five users hit the same problem, it is not an edge case. It is a problem. The team's instinct to dismiss findings should be resisted; the data is the data.

Mistake Four: Skipping Synthesis

After the sessions, you have notes from five users. Without synthesis, those notes stay in your head. Spend the time, right after the last session, writing up the patterns: what broke for multiple users, what surprised you, what to fix. The synthesis is what turns sessions into action.

Mistake Five: Not Reporting Back to Users

Users who help you with testing want to know they made a difference. A short follow-up after the redesign ("thanks for the help; here's what we changed because of your feedback") makes them likely to participate again and feel good about the product. The cost is low; the relational return is high.

A Final Word

Usability testing is the closest thing to free insurance in product. The cost is small. The return is enormous. Yet most teams don't do it consistently because the effort to set up the first test feels bigger than it is. Once the habit is established, it is easy to maintain, and the results pay back many times over in features that work as intended at launch.

If you have never run a usability test, run one this week. Pick a feature you are working on. Recruit three colleagues from a different team. Give them a task. Watch what happens. Take notes. The first session will surprise you, and the surprise is the start of a habit that will improve your product for the rest of your career.

Key Takeaways

Five users find about eighty-five percent of usability problems. Big budgets and large samples are not required to get most of the value.
Give users tasks, then watch and stay quiet. Don't lead, explain, defend, or rescue. The moment of stuckness is the data.
Watch behaviour over words. Users often say one thing and do another. Hesitation, missed elements, and unexpected clicks are signal.
Test early on low-fidelity designs, after major changes, before launch, and periodically after launch. Earlier testing is cheaper.
Synthesise patterns across users, rate severity, and feed findings back to the team and to the users who helped. Without synthesis, sessions don't become action.