I buy new cars infrequently, typically every 10 to 12 years. So when I bought a new car in 2003 I was surprised at the many advances in technology since I’d purchased my previous car, a 1993 Honda. One advance I was particularly pleased with was a sensor that automatically detects low air pressure in my tires. It is sometimes hard to tell by looking at a tire if its pressure is low, and checking tires manually is a dirty job, so I did it infrequently. A continuous test of tire pressure was, I thought, a tremendous invention.
During the same period in which car manufacturers invented ways to test tire pressure continuously, software development teams learned that testing their products continuously was also a good idea. In the early days, back when we wrote programs by rubbing sticks together, we thought of testing as something we did at the end. It wasn’t quite an afterthought, but testing was intended to verify that no bugs had been introduced during the prior steps in the development process. It was kind of like making sure the oven is off, the windows are closed, and the front door is locked before heading out for a vacation. Of course, after we saw all the things that had gone wrong during the prior steps of the development process (how could they not?) testing came to be viewed not as a verification step but as a way of adding quality to a product.
It wasn’t long before some teams realized that testing quality at the end was both inefficient and insufficient. Such teams typically shifted toward iterative development. In doing so, they split the lengthy, end-of-project test phase into multiple smaller test phases, each of which followed a phase of analysis-design-code. This was an improvement, but it wasn’t enough. And so with Scrum we go even further.
Scrum teams make testing a central practice and part of the development process rather than something that happens after the developers are “done.” Rather than trying to test quality after a product has been built, we build quality into the process and product as it is being developed.
Why Testing at the End Doesn’t Work
There are many reasons why the traditional approach of deferring testing until the end does not work:
It is hard to improve the quality of an existing product. It has always seemed to me that it is easy to lower the quality of a product but that it is difficult and time consuming to improve it. Think about a time in your past when you were working on an application that had already shipped. Let’s say you were asked to add a new set of features while simultaneously improving the existing application’s quality. Despite lots of good work on your part, it is likely that months or even a year or more passed before quality improved enough that users could notice. Yet this is exactly what we try to do when we test quality into a product at the end.
Mistakes continue unnoticed. Only after something is tested do we know that it really works. Until then you may be making the same mistake over and over again without realizing it. Let me give you an example. Geoff led the development of a website that was getting far more traffic than originally planned. He had an idea that he thought would improve the performance of every page on the site, so he implemented the change. This involved him writing some new Java code in one place and then going into the code for each page and adding one line to take advantage of the new, performance-improving code. It was tedious and time consuming. Geoff spent nearly an entire two-week sprint on these changes. After all that, Geoff tested and found that the performance gains were negligible. Geoff ’s mistake was in not testing the theoretical performance gains on the first few pages he modified. Testing along the way avoids unpleasant surprises like this at the end.
The state of the project is difficult to gauge. Suppose I ask you to estimate two things for me: first, a handful of new features; and second, how long it will take to test and fix the bugs in a product that has been in development for six months and is now ready for its first round of testing. Most people will agree that estimating the new work is far easier and more likely to be accurate. Periodic (or better yet, continuous) testing of a product is a probe into that product that lets us know how far along we are.
Feedback opportunities are lost. An obvious benefit of using Scrum is that the team can get feedback on what it’s built at least at the end of every sprint. The product can be deployed onto restricted-access servers or made available for download to select customers. If the product is at a sufficient quality level for doing this only near the end of a release cycle, the team misses great opportunities to gain valuable feedback earlier.
Testing is more likely to be cut. Because of deadline pressure, work that is planned to happen at the end of a project is more likely to be dropped or reduced.
The Test Automation Pyramid
Even before the ascendancy of agile methodologies like Scrum, we knew we should automate our tests. But we didn’t. Automated tests were considered expensive to write and were often written months, or in some cases years, after a feature had been programmed. One reason teams found it difficult to write tests sooner was because they were automating at the wrong level. An effective test automation strategy calls for automating tests at three different levels, as shown in Figure 1, which depicts the test automation pyramid.
At the base of the test automation pyramid is unit testing. Unit testing should be the foundation of a solid test automation strategy and as such represents the largest part of the pyramid. Automated unit tests are wonderful because they give specific data to a programmer—there is a bug and it’s on line 47. Programmers have learned that the bug may really be on line 51 or 42, but it’s much nicer to have an automated unit test narrow it down than it is to have a tester say, “There’s a bug in how you’re retrieving member records from the database,” which might represent 1,000 or more lines of code. Also, because unit tests are usually written in the same language as the system, programmers are often most comfortable writing them. Let’s skip for a moment the middle of the test automation pyramid and jump right to the top; the user interface level. Automated user interface testing is placed at the top of the test automation pyramid because we want to do as little of it as possible. We want this because user interface tests often have the following negative attributes:
- Brittle. A small change in the user interface can break many tests. When this is repeated many times over the course of a project, teams simply give up and stop correcting tests every time the user interface changes.
- Expensive to write. A quick capture-and-playback approach to recording user interface tests can work, but tests recorded this way are usually the most brittle. Writing a good user interface test that will remain useful and valid takes time.
- Time consuming. Tests run through the user interface often take a long time to run. I’ve seen numerous teams with impressive suites of automated user interface tests that take so long to run they cannot be run every night, much less multiple times per day.
Suppose we wish to test a very simple calculator that allows a user to enter two integers, click either a multiply or divide button, and then see the result of that operation. To test this through the user interface, we would script a series of tests to drive the user interface, type the appropriate values into the fields, press the multiply or divide button, and then compare expected and actual values. Testing in this manner would certainly work but would be prone to the brittleness and expense problems previously noted. Additionally, testing an application this way is partially redundant—think about how many times a suite of tests like this will test the user interface. Each test case will invoke the code that connects the multiply or divide button to the code in the guts of the application that does the math. Each test case will also test the code that displays results. And so on. Testing through the user interface like this is expensive and should be minimized. Although there are many test cases that need to be invoked, not all need to be run through the user interface.
And this is where the service layer of the test automation pyramid comes in. Although I refer to the middle layer of the test automation pyramid as the service layer, I am not restricting us to using only a service-oriented architecture. All applications are made up of various services. In the way I’m using it, a service is something the application does in response to some input or set of inputs. Our example calculator involves two services: multiply and divide. Service-level testing is about testing the services of an application separately from its user interface. So instead of running a dozen or so multiplication test cases through the calculator’s user interface, we instead perform those tests at the service level. To see how this might work, suppose we create a spreadsheet like Table 1, where each row represents one test case.
multiplier | multiplicand | product? | Notes |
5 | 1 | 5 | Multiply by 1 |
5 | 2 | 10 |
|
2 | 5 | 10 | Swap the order of prior test |
5 | 5 | 25 | Multiply a number by itself |
1 | 1 | 1 |
|
5 | 0 | 0 | Multiply by 0 |
A spreadsheet showing a subset of the multiplication service tests.
The first two columns represent the numbers to be multiplied, the third column is the expected result, and the fourth column contains explanatory notes that will not be used by the test but that make the tests more readable.
What’s needed next is a simple program that can read the rows of this spreadsheet, pass the data columns to the right service within your application, and verify that the right results occur. Despite this simplistic example where the result is simple calculation, the result could be anything—data updated in the database, an e-mail sent to a specific recipient, money transferred between bank accounts, and so on.
The Remaining Role of User Interface Tests
But don’t we need to do some user interface testing? Absolutely, but far less of it than any other test type. In our calculator example, we no longer need to run all multiplication tests through the user interface. Instead, we run the majority of tests (such as boundary tests) through the service layer, invoking the multiply and divide methods (services) directly to confirm that the math is working properly. At the user interface level what’s left is testing to confirm that the services are hooked up to the right buttons and that the values are displaying properly in the result field. To do this we need a much smaller set of tests to run through the user interface layer.
Where many organizations have gone wrong in their test automation efforts over the years has been in ignoring this whole middle layer of service testing. Although automated unit testing is wonderful, it can cover only so much of an application’s testing needs. Without service-level testing to fill the gap between unit and user interface testing, all other testing ends up being performed through the user interface, resulting in tests that are expensive to run, expensive to write, and brittle.
The Role of Manual Testing
It is impossible to fully automate all tests for all environments. Further, some tests are prohibitively expensive to automate. Many tests that we cannot or choose not to automate involve hardware or integration to external systems. A photocopier company I consulted to had a number of tests that needed human intervention before they ran. For example, making sure there were exactly five pieces of paper in the paper tray was easier to do manually than to automate. In general, manual testing should be viewed primarily as a way of doing exploratory testing. This type of testing involves a rapid cycle through the steps of test planning, test design, and test execution. Exploratory testing should feature short, feedback-generating cycles through these steps in a manner analogous to test-driven development’s short cycle of test-code-refactor. Beyond finding bugs, exploratory testing can also identify missing test cases. These can then be added at the appropriate level of the test automation pyramid. Further, exploratory testing can uncover ideas that are missing from the user story as initially understood. It can also help a team discover things that seemed like a good idea at the time but seem like bad ideas now that the feature has been developed. These situations usually result in new items being added to the product backlog.
What Building In Quality Looks Like
A team that has integrated testing into its day-to-day work will look and behave very differently from a team that attempts to test quality at the end. Some of the observable traits of a team that builds quality in include the following:
The use of good engineering practices. A team focused on building in quality will do whatever it can to write the highest quality code possible. This will include pair programming or thorough code inspections for at least the most complex parts of the system. There will be a strong focus on automated unit testing, if not test-driven development. Refactoring will happen continuously and as needed rather than in large, noticeable spurts. Code will be continuously integrated, and failures in the build will be treated with almost the same urgency as a customer-reported critical bug. You’ll also notice that code will be owned collectively by the team rather than by individuals so that anyone noticing an opportunity to improve quality can take it.
The hand-offs between programmers and testers (if they exist at all) will be so small as not to be noticeable. Doing a little of everything (designing, coding, testing, and so on) all the time helps teams work together. When working that way, a programmer and tester talk about which capability (or partial capability) will be added to the product next. Then the tester creates automated tests and the programmer programs. When both are done the results are integrated. Although it may be correct to still think of there being hand-offs between the programmer and tester, in this case, the cycle should be so short that the hand-offs are of insignificant size.
There should be as much test activity on the first day of a sprint as on the last day. A team that is building quality in avoids working in miniature waterfalls. There are no distinct analysis, design, coding, or testing phases within a sprint. Testers (and programmers and other specialists) are as busy on the first day of a sprint as they are on the last. The type of work may differ between the first and last day of a sprint. For example, testers may be specifying test cases and preparing test data on the first day and then executing automated tests on the last, but they are equally busy throughout.