Retry strategy is an approach you can implement into your test to run it again when it fails by any condition: assertion error or exception.
If a test fails the retry strategy will execute it again N times leading to the successful execution when one of the N executions got success or to will fail if all N executions have failed.
Martin Fowler coined the term Non-Determinism which is the tests that sometimes pass and sometimes fail. You can read his article to get some tips on how to deal with it and to learn why retry isn’t an option.
You might found many articles saying that you can apply a retry strategy to identify flaky tests. If you apply it your test might fail during the first run and pass in the second or following ones, so you have a Non-Determinism test using the retry strategy (but you shouldn’t use it)
I have seen many software engineers using retry strategies to reduce the amount of time they need to investigate the errors because they are saying that is hard to find the root cause for the network, waiting time, or test script issues.
Imagine that you are continuously running your tests, whatever are the levels you use, through a pipeline. You will trust in the end result and might now see the failures between the retries. So it extremely risky to rely on that.
We must properly investigate the root cause and provide a solution for the failures, otherwise, we will lose the confidence we have in the pipeline or any approach you are using to run the test.
I would like to hear from you why you think the retry strategy should or shouldn’t be in use.
I totally agree with you. By having deterministic tests (AKA flaky tests) one is agreeing that their tests are not reliable/trusted and this can also cause a chain reaction of having various flaky tests and therefore even less trust in the test strategy. My recommendation would be to “quarantine” any tests that are flaky as soon as they are spotted and stop running those as part of the CI until the team finds time to fix them. That is possible by tagging those tests with an indicator (in my example would be @quarantine) and setting up the CI to ignore those tests. I believe the majority of the testing frameworks (testNG, jUnit, cucumber) allows that.
Now the team ahas a tech debt, which can be really hard to find time to work on every flaky test, but there might be some tests that are related to a feature that the team is currently working on or even related to a bug that was found, therefore those are the perfect opportunity to fix some of those flaky tests without having to pay the whole tech debt.