Handle test data the right way in Pytest

October 06, 2021

1 min

THE PROBLEM

Handling test data in pytest can get quite challenging for tests with complex test data (imagine a deeply nested JSON as a request body to an API). In addition to that, if there is a need to parameterize such tests, both code readability and test maintainability go for a toss. In real-world tests, we find this happening very frequently.

Let us look at this sample code to understand the problem better. Consider writing tests for validating email server actions like this


@pytest.mark.parameterize("input", [({
                                         "fromEmail": "john@doe.com",
                                         "toEmail": "jane@doe.com",
                                         "description": "All fields are filled in with valid values",
                                         "message": {
                                             "subject": "Sick Leave",
                                             "body": "Not feeling well, taking the day off",
                                             "signature": {
                                                 "firstname": "John",
                                                 "lastName": "Doe",
                                                 "salutations": "Regards"
                                             }
                                         }
                                     },
                                     {
                                         "fromEmail": "jane@doe.com",
                                         "toEmail": "john@doe.com",
                                         "description": "signature is null",
                                         "message": {
                                             "subject": "Nearing estimate due date",
                                             "body": "Hi\n, Please send your work estomation for the sprint in this thread by EOD.",
                                             "signature": None
                                         }
                                     })])
def test_send_email_hapy_path(input):
   from_email = input["fromEmail"]
   to_email = input["toEmail"]
   message = input["message"]
   response  = send_email(from_email, to_email, message)
   assert response.sent is True
   assert response.error is None


@pytest.mark.parameterize("input", [({
                                         "fromEmail": "john.com",
                                         "toEmail": "jane@doe.com",
                                         "description": "All fields are filled in with valid values",
                                         "message": {
                                             "subject": "Sick Leave",
                                             "body": "Not feeling well, taking the day off",
                                             "signature": {
                                                 "firstname": "John",
                                                 "lastName": "Doe",
                                                 "salutations": "Regards"
                                             }
                                         }
                                     },
                                     {
                                         "fromEmail": None,
                                         "toEmail": "john@doe.com",
                                         "description": "signature is null",
                                         "message": {
                                             "subject": "Nearing estimate due date",
                                             "body": "Hi\n, Please send your work estomation for the sprint in this thread by EOD.",
                                             "signature": None
                                         }
                                     })])
def test_send_email_invalid_email_address():
   from_email = input["fromEmail"]
   to_email = input["toEmail"]
   message = input["message"]
    with pytest.raises(InvalidMessageBodyException) as e:
        send_email(from_email, to_email, message)

if we observe the code, we can see,

test_send_email_hapy_path is trying to validate the happy path scenario for sending an email.
test_send_email_invalid_email_address is validating exceptions thrown because of invalid email ids
both the tests have 2 inputs in the parameterized call, so each test will run twice.

This style of test is hard to maintain because of following reasons,

adding and updating tests to the parameterized call is very clumsy and error-prone.
Tests are unnecessarily lengthy.
test is hard to read because of the nested JSON structure in the parameterized call.
Now, imagine adding new tests with their own parameterized calls. The code readability and maintainability further become a daunting task.

THE SOLUTION

To handle the problem outlined above, we can do the following things,

1. Create a JSON file with the test data describing the scenarios.

{
  "happyPath": [
    {
      "from": "john@doe.com",
      "to": "jane@doe.com",
      "description": "All fields are filled in with valid values",
      "message": {
        "subject": "Sick Leave",
        "body": "Not feeling well, taking the day off",
        "signature": {
          "firstname": "John",
          "lastName": "Doe",
          "salutations": "Regards"
        }
      }
    },
    {
      "from": "jane@doe.com",
      "to": "john@doe.com",
      "description": "signature is null",
      "message": {
        "subject": "Nearing estimate due date",
        "body": "Hi\n, Please send your work estomation for the sprint in this thread by EOD.",
        "signature": null
      }
    }
  ],
  "invalidEmail": [
    {
      "from": null,
      "to": "jane@doe.com",
      "description": "email id is null",
      "message": {
        "subject": "Sick Leave",
        "body": "Not feeling well, taking the day off",
        "signature": {
          "firstname": "John",
          "lastName": "Doe",
          "salutations": "Regards"
        }
      }
    },
    {
      "from": "jane.com",
      "to": "jane@doe.com",
      "description": "invalid from email",
      "message": {
        "subject": "Sick Leave",
        "body": "Not feeling well, taking the day off",
        "signature": {
          "firstname": "John",
          "lastName": "Doe",
          "salutations": "Regards"
        }
      }
    },
    {
      "from": "john@doe.com",
      "to": "jane.com",
      "description": "invalid to email",
      "message": {
        "subject": "Sick Leave",
        "body": "Not feeling well, taking the day off",
        "signature": {
          "firstname": "John",
          "lastName": "Doe",
          "salutations": "Regards"
        }
      }
    }
  ],
  "invalidMessage" : [
    {
      "from": "john@doe.com",
      "to": "jane@doe.com",
      "description": "Subject is null",
      "message": {
        "subject": null,
        "body": "Not feeling well, taking the day off",
        "signature": {
          "firstname": "John",
          "lastName": "Doe",
          "salutations": "Regards"
        }
      }
    },
    {
      "from": "john@doe.com",
      "to": "jane@doe.com",
      "description": "Body is null",
      "message": {
        "subject": "Sick Leave",
        "body": null,
        "signature": {
          "firstname": "John",
          "lastName": "Doe",
          "salutations": "Regards"
        }
      }
    }
  ]
}

By creating a separate JSON file for maintaining test data, we can create a good separation of concern between test methods and the test data. Also, adding and updating tests and test scenarios becomes much easier.

2. Write functions to read the JSON file and convert it to named tuples

def create_namedtuple_from_dict(obj):
    """converts given list or dict to named tuples, generic alternative to dataclass"""
    if isinstance(obj, dict):
        fields = sorted(obj.keys())
        namedtuple_type = namedtuple(
            typename='TestData',
            field_names=fields,
            rename=True,
        )
        field_value_pairs = OrderedDict(
            (str(field), create_namedtuple_from_dict(obj[field]))
            for field in fields
        )
        try:
            return namedtuple_type(**field_value_pairs)
        except TypeError:
            # Cannot create namedtuple instance so fallback to dict (invalid attribute names)
            return dict(**field_value_pairs)
    elif isinstance(obj, (list, set, tuple, frozenset)):
        return [create_namedtuple_from_dict(item) for item in obj]
    else:
        return obj


def inject_test_data(file):
    """
        Read the content of the JSON file and convert it to a named tuple,
        can be used for injecting test data set to tests, helps in separating test data from the tests
    """
    file = str(RESOURCES_PATH.joinpath(file))
    with open(file) as f:
        raw_data = json.load(f)
    return create_namedtuple_from_dict(raw_data)

these are one-time utility methods that can be reused for all the JSON files. The main reason for converting JSON to a named tuple is to use a class-like syntax to fetch the fields. In my opinion, these are much easier than accessing field value from dictionaries. Also, because of duck typing, we can pass named tuples as objects wherever required.

3. Inject the test data from the JSON file into the tests

class TestData:
    test_data = inject_test_data(file="email/emailTestSet.json")

    @pytest.mark.parametrize("input", test_data.happyPath)
    def test_email_happy_path(self, input):
        print(input.description)
        response = send_email(input.from_, input.to, input.message)
        assert response.sent
        assert response.error is None

    @pytest.mark.parametrize("input", test_data.invalidEmail)
    def test_email_validation_failed(self, input):
        print(input.description)
        with pytest.raises(InvalidEmailException) as e:
            send_email(input.from_, input.to, input.message)

    @pytest.mark.parametrize("input", test_data.invalidMessage)
    def test_email_body_validation_failed(self, input):
        print(input.description)
        with pytest.raises(InvalidMessageBodyException) as e:
            send_email(input.from_, input.to, input.message)

Now, if we observe, the tests are agnostic to the underlying test data. This separation of concerns helps in the maintainability of the tests. The complete example of the solution can be found here