It’s what we all strive for, right? A set of automated test scripts, often called automated smoke tests, that are run on each deployment to each environment. We often think of this as our first step toward “push to prod on each code commit”. We always expect these scripts to report success for a good build and deployment; if they don’t succeed, the build or deployment is marked as failed and our CI/CD pipeline reports the failure. Sound all the Klaxons! Flash all the lights! Send all the alert emails! Those alerts mean that the software is egregiously broken, i.e., broken in such a way that there is no value in further testing regardless of if that testing would have been via automation or by humans; further testing would be a waste of time.

Or would it? As in all things, the answer is, “it depends”.

Generally, when we create a suite of smoke tests, we want those tests to be broad, shallow, and fast. By broad, I mean we want that suite to touch as many subsystems on our application as we can within a tolerable duration. Keeping smoke tests broad and shallow lets us get a brief look at the basic health of many of our application’s subsystems. Some of these subsystems will have a higher level of business criticality than others. For example, in an e-commerce application, the ability to create a wish list generally has a lower business impact than the ability to buy a product with a credit card.

As indicated above, in the context of automated deployments, the result of running a smoke test suite is often treated in a binary fashion: the build is good enough for other purposes or it is not. Taking this approach for our e-commerce example means that a failing test script for credit card payments has the same criticality as a failing wish list test.

This is a pretty good practice. There typically isn’t value in further testing if core features are broken. For example, if our application’s login mechanism isn’t working in the current build and the application has no appreciable features that can be accessed pre-login, additional testing will probably provide little if any additional value. It would probably be more valuable to assist in assessing the reasons for the smoke test failures (i.e., why is login broken?) and performing other activities in preparation for testing the next “good enough” build.

As always, context is important. There are times that smoke test failures indicate a major issue in one subsystem, but that subsystem is sufficiently independent of other subsystems such that further testing on those other subsystems is valuable; we might be able to obtain additional information about the state of the application with tolerable risk regarding retest due to code fixes. Returning to the previous e-commerce example, if the “pay with a credit card” subsystem is broken, certainly all feasible efforts should be expended to resolve that issue as quickly as possible. The chances are, however, that not everyone on the team is needed to actively work on that problem. What should the other team members be doing? Perhaps there is a high value with a low risk of rework to perform deeper testing of the non-broken subsystems such as “add to wish list”.

You might be asking, “If we can still test when a smoke test failure causes the deployment to fail, why is that specific test in the smoke test suite?”. That is a valid question, but again, context matters. Not everyone will have the exact same criteria for identifying smoke tests and not all organizations will treat smoke test failures as “showstoppers”. Organizations can, and should, make their smoke test suites behave in a way that’s valuable for them; there is no cookie-cutter approach that will be appropriate for all organizations.

Like this? Catch me at an upcoming event!