Thursday, 14 June 2007

Testing on a shortfuze

There wasn't much activity on the blog, so I'm borrowing for a little while it to add some development details of how we're building 'storm.

Standard software development calls for a large suite of test cases as you're programming. The idea is that you write the tests for whatever you're going to build, then build it and check it passes the tests. The many benefits of this technique include:
  • If someone else plays with your code they have more of an idea as to what the outcome should be.
  • When something just won't work, running all the tests can highlight where the issues lie.
  • you can automate tests so people get annoying emails if the code the checked in doesn't pass all the tests.
  • It forces developers really understand what you're doing before you start coding.
The thing development (we're not a game, but the tech is the same) doesn't use unit tests. (Is this why games always overrun their budgets and time?)

The problem lies in the fact that it is very difficult to unit test a mostly visual medium. Does the character look right? Is she triangulated nicely? Do the buttons line up? What happens when you move the mouse slightly while clicking a button, do the characters shatter when you drag the window around the desktop?.

Standard agile practice would have us adding a unit test for each instance, there are techniques that we can use for this - the java.awt.Robot takes you so far - you can automagically move the mouse to a location and click, or check the colour at a location. But it becomes very fragile - if a button moves location or the artists change the skin tone you "red line" on the tests. Keeping in the green takes an intolerable quantity of time, compared to eye-balling the output as you go.

What are we doing at the moment? We've got a really really fast turn around between the programmers (~minutes) - "you forgot to add that jar to the build paths". Then there's a quick turn around between the QA guys and the programmers (~hours) - "I can't save my movie". We've got a slightly longer turn (~days) around between the QA guys and the in house machinimators (product users) - "My video file has more artefacts than it did". And we've got ~weeks between QA and our cutting edge beta testers - "you can't save the movie when you try to open door #23 on a tuesday". Each layer removes some of the problems and the most annoying problems are flattened first.

It's not a textbook operation, but with a small team it works surprisingly well.


Anthony Bailey said...

If I can venture an unsolicited opinion... here's an implausibly domain-crossing blanket assertion - but one that's nevertheless a representative distillation of what I've found throughout my experiences in test-driving functionality in highly graphical direct manipulation apps.

Don't let the pixellated robots in. It's always worthwhile investing in abstracting a more directly testable representation of any output you're tempted to handle through regression testing of captured bitmaps, and likewise with representing the essence of input rather than capturing and replaying mouse and other events directly.

Robot-driven tests working at the level of pixel coordinates and content are even more fragile and unworkable than good coder intuition first suggests. To make that approach scale to a project of a decent size, you have to spend a huge amount of time writing tools and helpers to make it halfway plausible to test from the outside. The better use of the same coding time is to invest it inside the domain, making it possible to attach the test probes at the levels at which you really want to express the behaviour.

Matt K said...

Is there any such thing as an "unsolicited opinion" in the blogosphere? The very existence of a comment facility tends to suggest to me that any and all opinions are solicited! ;)