If you've ever done much GUI test automation, you are likely familiar with the difference between what a computer can do and what a human can do. A human sees that the “submit” button is now fixed to a capital S, clicks it, and goes on his merry way, while a computer, trying to connect to the element=textbox, text=submit, fails to click and logs an error.
Traditional GUI automation is linear; it follows a set of steps. The first time you run it, it can't add any value, as the feature isn't done until the automation runs. Thus, the first time it runs, the automation doesn't find any bugs.
There is some value in the writing of these tests in that you find bugs along the way, but any human doing exploration would also find these bugs easily, without any "try to create test, run, fail, wait for fix, try again" loop, having instead the much shorter "try, fail, wait for fix, try again" loop.
This creates the first reality of GUI automation: Once test automation runs a second time, it stops being that and instead becomes change detection.
Bear in mind, the job of the programmer is to create change. As a result, the naive approach leads to a large number of "failures" that are not actually failures, as well as an ongoing, perhaps ever-increasing test maintenance cost.
After having this discussion for more than a decade, I've come to agree with my friend Danny Faught, who proposed the term "false failures." The discussion of whether they are false positives or false negatives is a distraction. The test tool is telling us about failures that aren't actually failures.
Instead of debating over the title, let's fix it.
Dealing with False Failures
By false failures, I do not mean "flaky" tests, which I define as checks that sometimes pass and sometimes fail, generally due to timing or environment elements that change over time for some reason. No, I mean straight-up false failures, like a middle name is now a required field and we get an error, which we should, because the Selenium scripts don't fill in a middle name.
The simplest answer may be to make the programmers responsible for maintaining the tooling. When they finish a feature, run all the checks for that feature, then "green" the tests. I'll call this “the West Coast approach,” because I primarily see it with West Coast companies that do not consider testing a separate role from development. After all, it is common for programmers doing test-driven development, or TDD, to do this with their unit tests, and customer-facing GUI tests are just the next level up, right?
Sadly, I have not had a great deal of success with this approach. I have seen some instances of it working, mostly with groups with a few testers who are more like coaches and where the programmers are the ones writing the automation. Outside of the West Coast, when I come back to these teams in a year or two or five, the tooling is often abandoned.
Another approach is to use automation that has a model for how the software will work. Page Objects was one attempt to build a model. When the GUI changes for search, for example, you change one single search method, rerun, and everything passes. This does not eliminate the problem, but it limits the impact. A few of the modern tools either generate the checks from a model or allow you to fix and rerun a test in place, as if debugging an IDE and changing the code. These methods can all save a great deal of time.
Finally, there is the potential to only create the tooling once the GUI is "stable." Then the tests run when something else changes. For example, you might rerun the tests every time you upgrade the ERP system or the operating system, or when a new browser version is released. When these things fail, it is more likely they are real failures due to the environment.
That approach takes the flaky-test problem and inverts it. Lock down the environment, see the failures, and fix them. These sorts of checks tend to look at the full, end-to-end use cases, which is our next problem to look at.
Constructing the GUI Checks
My friend Patrick Bailey, a professor at Calvin College, once did a research project on how people think about testing. Not surprisingly, he found that it often comes down to role. Programmers think of unit tests, analysts think of system tests, subject matter experts think of user acceptance testing, and so on.
In my experience, the most customer-facing testers tend to focus on the user journey. They want to see the whole thing work. In e-commerce land, this is the path to purchase. In ERP land, it will be end-to-end functions, everything from manufacturing the product to getting it on the shelves, including sales, in an integrated supply chain.
That's a lot of functionality.
If anything changes along that journey (and it will—remember the first reality of GUI automation), then we'll experience a false negative, likely need to rerun to the failure point, pause, "fix" the code, and re-run. If we are lucky, that change only has to happen in one place.
Humans and computers are good at different things.
Instead of seeing one long user journey, we could see that as a series of tiny interactions, each one from the DOM to database and back again. Our user journey is likely 20 of these. With a few testability hooks to change the environment, we can likely create 20 checks, which we can split up and run in parallel. Instead of taking 15 minutes to get a result, we get a result in less than a minute, know exactly what failed, and can test and rerun just that one test. Our "test suite" of 10 user journeys becomes 200 small pieces.
That does create some amount of risk. We'll need to have a small, continuous layer of human exploration over the top to reduce that risk, and we won't be able to completely remove the human from the equation.
Of course, we never could completely remove the human from the equation. Let's stop trying to make promises we can't keep, and instead let the computer do what it is good at—and let the humans do what we are good at as well.
User Comments
Yeah! It totally make sense. Thanks for sharing.