We will try to solve one of the worst practices used on tests at any level: fixed/hard-coded data.
I want to avoid as much as possible any manual pre-action before I can run my tests and, because of that, I try to avoid as well the usage of static files (CSV, TXT, XLS, JSON).
Here we will see a common usage from Java developers: the RamdomStringUItils
and how it might not be the best choice for automatic data generation.
By the way, I recommend automatic data generation in the tests using the Test Data Factory approach, and you can find an example here in my blog: Test Data Factory: Why and How to Use.
The examples described here are simple, without the usage of the Test Data Factory, and will show you why the RandomStringUtils
might not be the best approach.
We will automatically generate data to a Customer object with the following criteria
Attribute | Type | Constraints |
---|---|---|
id | int | Not null |
name | String | Not null and size between 2 and 50 characters |
profession | String | Not null and size between 2 and 30 characters |
accountNumber | String | Not null and size as 18 characters |
address | String | Not null and size between 2 and 50 characters |
phoneNumber | String | Not null and size between 11 and 14 characters |
birthday | Date | Not null |
To reduce the number of tests, the key point is to generate valid data given the constraints. In a professional environment, we would implement the tests for the edge cases as well.
Think about this Customer data as an object used in any test level (unit, integration, service, UI).
RandomStringUtils is a class from the Apache Commons Lang library that generates random Strings based on different conditions like:
It’s a static class where you can directly generate any String, so it’s super handy!
See the example below, where you can generate a different set of random data.
public class RandomStringUtilsExample { | |
public static void main(String[] args) { | |
// returns a String with 5 numbers | |
// example 82114 | |
RandomStringUtils.randomNumeric(5); | |
// returns an alphanumeric String with length as 30 mixing upper and lower cases | |
// example gQ6RB8MiwKOg9O3qnHFo7I3jilHoIy | |
RandomStringUtils.randomAlphanumeric(30); | |
} | |
} |
Let’s first take a look at the code example implementing the usage of RamdonStringUtils
:
RandomStringUtils.randomAlphanumeric()
to generate alphanumeric dataRandomStringUtils
generates only Strings
class BasicExampleTest { | |
@Test | |
@DisplayName("Data validations using RandomStringUtils") | |
void randomStringUtils() { | |
CustomerData customerData = CustomerData.builder(). | |
id(Integer.valueOf(RandomStringUtils.randomNumeric(10))). | |
name(RandomStringUtils.randomNumeric(50)). | |
profession(RandomStringUtils.randomAlphanumeric(30)). | |
accountNumber(RandomStringUtils.randomAlphanumeric(18)). | |
address(RandomStringUtils.randomAlphanumeric(50)). | |
phoneNumber(RandomStringUtils.randomAlphanumeric(14)). | |
birthday(new Date()). | |
build(); | |
} | |
} |
The output of the test execution, if we print or inspect the customerData
object, is:
{ | |
"id": 1335130963, | |
"name": "GGXS19kN6kSuzHwW6T0YjJCxUaIyKKmAaUdQH51gdUAtt1TwqY", | |
"profession": "0kk8HSiFgCUVfLzbD3PyR6cn8j0LH3", | |
"accountNumber": "PqvekXb9ekRAJi3ypy", | |
"address": "90lqP2LHnQMWtmMP8vasO3BR5dsICIL85u5sJ0yjGKWXxCkFsj", | |
"phoneNumber": "OpoJ3tOE53woy9", | |
"birthday": "Sep 26, 2021, 10:01:10 PM" | |
} |
We could successfully generate the necessary data! Yay!
DataFaker is an open-source library based on (actually an improvement of) DataFaker to generate fake data.
I invite you to take a look at the GitHub repo and see the different objects to generate data.
The code implementation to generate data using the CustomerData
class is:
number()
method is in use to generate a random numbername()
method is in use to generate a full namecompany()
is in use to generate a professionfinance()
method is in use to generate a valid IBAN for the Netherlands countryaddress()
method is in use to generate a full street addressphoneNumber()
method is in use to generate a cell phone numberdate()
method is in use to generate birthday data for the age between 18 and 90class BasicExampleTest { | |
@Test | |
@DisplayName("Data validations using faker library") | |
void faker() { | |
Faker faker = new Faker(); | |
CustomerData customerData = CustomerData.builder(). | |
id((int) faker.number().randomNumber()). | |
name(faker.name().name()). | |
profession(faker.company().profession()). | |
accountNumber(faker.finance().iban("NL")). | |
address(faker.address().streetAddress()). | |
phoneNumber(faker.phoneNumber().cellPhone()). | |
birthday(faker.date().birthday(18, 90)). | |
build(); | |
} | |
} |
The output of the test execution, if we print or inspect the customerData
object, is:
{ | |
"id": 520543, | |
"name": "Tena Pagac", | |
"profession": "photographer", | |
"accountNumber": "NL07HUUN1518167413", | |
"address": "12672 Romaguera Tunnel", | |
"phoneNumber": "(561) 638-5813", | |
"birthday": "Mar 5, 1982, 10:29:18 AM" | |
} |
We could successfully generate the necessary data! But let’s not focus on the differences.
There’re two aspects I would like to consider to choose between one approach or another:
We can see the main differences comparing the data result side by side (click on the image to expand it):
The regular activity for an engineer that writes code is troubleshooting: we constantly see the logs and debug the application to understand current and future problems in the code.
Now imagine yourself looking at the CustomerData
object where the data was filled in with the RandomStringUtils
approach: it’s hard to correlate the data you have with a list of objects you might get or even take a look at the data used inside a log file.
For most of the attributes present in the CustomerData
class, you can use RandomStringUtils
to generate the different criteria. For example, you can easily set 51 characters to the name attribute and expect a failing constraint validation using RandomStringUtils.randomAlphanumeric(51);
For more specialized data, like phone number and date you need a proper library, and DataFaker can generate both data.
In this way, we can make the process easier by adopting one library.
Of course, I’d put more emphasis on the DataFaker library because we have almost everything we need to generate data, but it does not exclude a possible necessity to use the RandomStringUtils class or any other class placed in the Apache Commons library.
The main consideration here is the ability to generate all the possible data you need using a single source of truth without reinventing the wheel, as well as the indirect benefits it will show during the troubleshooting process.
The avoid-random-string-utils project shows a basic example comparing RandomStringUtils vs DataFaker.
The restassured-complete-basic-example project has a factory data class to generate all the necessary data in different conditions. It’s a good real-world example.