Test Data Factory: Why and How to Use

Published by Elias on

What problem does it solve?

One of the biggest pain points during test automation, whatever the layer you are testing, is to manage the test data. I would say it’s quite easy to manage in the unit and integration levels, where we can control the application state, but for functional and e2e tests it’s quite hard.

Would be awesome if we could stop manually change data we have in the code file or even from an external source like a CSV or JSON file, right?

We would like to solve the code maintenance and the test failures due to the data changes and create an easy and extensible way to generate test data.

What is the Test Data Factory?

We don’t have an agreement about test terminologies (yes, I know that sucks) and we don’t have a definition for the Test Data Factory approach, so here’s my definition.

During some research1 (read this as “I was using Google”) I found interesting posts related to the same concept: the ObjectMother pattern

1 the research:

The example

Think that you have an application you must fill in a form where get a loan, where it has these two requirements:

  • a not null name
  • a valid e-mail
  • an amount should be greater than $ 1.000 and less than $ 40.000
  • the installments should be greater than 2 and less than 10

If you already know some testing techniques you can see yourself applying the Boundary Value Analysis (BVA) testing technique for the amount and installments inputs.

We must figure out a way to test all those possible scenarios filling the form with, of course, some data:

  • all valid information
  • name as null
  • not valid e-mail
  • an amount less than $ 1.000
  • an amount greater than $ 40.000
  • installments less than 2
  • installments greater than 18

Test example with hard-coded data

Let’s assume you’re using Selenium WebDriver to fill in this information on a web page. A hard-coded data example would be:

Code snippet to fill in using hard-coded data

The example above is using Java and the Page Object pattern to fill in a form on a web page. Notice that from lines 3 to 6 we are using fixed data to fill the form.

Using a fixed data in the test is considered a bad practice and should be avoided in order to have less maintenance in the automated test code.

How to implement it?

There’re a few steps we must follow:

  1. Create an object to model the data you need
  2. Create a builder class
  3. Create a factory class to consume the data
  4. Modify the class to generate dynamic data
  5. Use the factory class in the test

Create an object to model the data you need

In our example, we are filling a loan form based on name, e-mail, amount, and installments. We need to create an object (class) to record this information.

In a plain Java class we can model our example like this:

Example of the Loan model object

When we create a model class using Java we need three things: the private attributes, the constructor, and the getters and setters.

We need to create the same attributes we’re going to use to fill in the data. You can see this in lines 4 to 7.

The constructor is necessary to create the object and easily inform the data, but we are going to use it to generate our builder class. You can see this on lines 10 to 14.

The getters and setters can be used to get or set a single data in the object. You can see this on lines 19 to 25.

Lines 28 to 33 show the toString the method that will be necessary to log the data we will use in the tests.

You might not need the getters and setters if we plan to use the builder object so you can remove it and set the attributes as final, but you still need the constructor and the toString method.

Note from the author

The model class would be used as the example below, but we will make it more readable and fluent using the builder pattern.

Loan loan = new Loan("elias", "elias@eliasnogueira.com", new BigDecimal(10.000), 8);

Create a builder class

You could use the getters and setters in the Loan class (if you didn’t remove it) to create the class with data, but we have a better and fluent way to do it: the builder pattern.

I would recommend you to read the link below but, in general, and for this example, a builder consists of the same attributes we have in the Loan class, methods that receive an attribute as a parameter and return the builder class, and a build method that will create the concrete class.

Let’s learn the internal structure.

Lines 3 to 6 have the same attributes as the Loan class. We need it to create the class with the parameters filled in using its constructor.

The lines from 8 to 26 are the build methods. The method must return the builder class, have a parameter, associate the parameter with an attribute, and return itself (this). Usually, the method name is the same as the data you fill inform.

Lines 28 to 30 show the build method. This method is responsible to create the main object, that in our case is Loan. It uses all the attributes values used through the methods to create it where it will return a new Loan object.

Within it you can create the class with data in a fluent way, like this:

Usage example of the LoanBuilder class

Note that we can use the LoanBuilder to chain the methods, as it returns itself, filling in the necessary data and, always, constructing it using the build method.

Create a factory class to consume the data

The factory data class is slightly different from the Factory pattern (actually it’s simpler) where we have the following aspects to implement:

  • a private constructor to avoid the class instantiation
  • a static method that will generate the data
    • as it generates de data it must return it
    • the method must return the data model

In the example below, we have different factory methods creating a loan (createLoan), creating a loan with an invalid e-mail (createLoanWithNotValidEmail), and others.

Note that we are using the LoanBuilder to easily add the correct data to the attributes, returning the object Loan with all the necessary data.

Example of Data Factory class with hard-coded data

As you can see the data still hard-coded. It’s ok but we can improve it a little bit more adding a dynamic generation.

Modify the class to generate dynamic data

We can dynamically generate data using different ways. The three main ones are:

  • using a library
  • using an API endpoint
  • using a database

I will focus on the first one: using a library.

For that I will introduce you JavaFaker, a is a library that can generate fake random data every time it’s called. For example: if you generate a name all the names generated will be different.

Example of a data factory method using the dynamic data generator with JavaFaker

As you can see in the code above the hard-coded data was replaced by the usage of Java Faker. You will receive different data all the time you call the factory method, but it’s still valid data.

The constants created from line 4 to 8 was added to add more clarity and reduce the maintenance in changing the values in the future.

You can see from line 16 to 21 the usage of Java Faker to generate the correct value for each field. For example, the email is added to the Loan object using the faker.internet().emailAddress() where it generates a valid e-mail.

Use the factory class in the test

I explained in the Test Example with Hard-Coded data about how this class looks like. Let’s now using the Test Data Factory approach we learned.

Usage of the Data Factory class in a test

Line 4 shows the usage of the data factory method createLoan() where the object returned is set to the validLoanData attribute.

Lines 7 to 10 show the usage of the Loan object (validLoanData) to fill in each data using the getters. The data will be different every time we run the test.

Real examples

Would you like to see real examples for two different test targets: web and API?

Web

In the selenium-java-lean-test-achitecture project, you will find the BookingDataFactory class that generates data for booking a room.

Notice that I have a restriction: in the frontend, I have fixed data for the countries and daily budget. We can still randomize it and make the factory class return random data for a limited set of data.

API

In the project restassured-complete-basic-example you will find different data generation approaches in the data package. The SimulationDataFactory class has a complete set of factory methods for all the possible scenarios.

In some cases, you can even call some existing endpoints to provide data you need to use in your tests. It’s the case for the method getAllSimulationsFromApi() where I’m getting the existing simulation, using it in the oneExistingSimulation() factory method to get a random existing simulation and on allExistingSimulations() to return all the existing simulations.


Elias

I help professional software engineers (backend, frontend, qa) to develop their quality mindset and deliver bug-free software so they become top-level developers and get hired for the best positions in the market.

3 Comments

Brian Elgaard Bennett · November 21, 2020 at 3:20 PM

Hi, I liked your perspective on the Object Mother Pattern. Did you read mine? I prefer to declaratively state the “data” behind a test double. https://belgaard.medium.com/why-dont-you-take-given-in-bdd-seriously-f168da29f1c

    Elias · November 22, 2020 at 9:42 PM

    Hi Brian,
    I loved your post!
    I’ve seen people using declarative state and I think it’s an excellent approach to easily manage the changes and clearly understand/read the code.
    Thank you for share it 🙂

Testing Bits: 367 – November 15th – November 21st | Testing Curator Blog · November 22, 2020 at 6:11 PM

[…] Test Data Factory: Why and How to Use – Elias Nogueira – http://www.eliasnogueira.com/test-data-factory-why-and-how-to-use/ […]

Leave a Reply

Your email address will not be published. Required fields are marked *