4

I have a fixture, which returns a pd.DataFrame. I need to insert the individual columns (pd.Series) into a unit test and I would like to use parametrize.

Here's a toy example without parametrize. Every column of the dataframe will be tested individually. However, I guess I can get rid of the input_series fixture, can't I? With this code, only 1 test will be executed. However, I am looking for 3 tests while getting rid of the for-loop at the same time.

import numpy as np
import pandas as pd
import pytest


@pytest.fixture(scope="module")
def input_df():
    return pd.DataFrame(
        data=np.random.randint(1, 10, (5, 3)), columns=["col1", "col2", "col3"]
    )


@pytest.fixture(scope="module")
def input_series(input_df):
    return [input_df[series] for series in input_df.columns]


def test_individual_column(input_series):
    for series in input_series:
        assert len(series) == 5

I am basically looking for something like this:

@pytest.mark.parametrize("series", individual_series_from_input_df)
def test_individual_column(series):
    assert len(series) == 5
Andi
  • 3,196
  • 2
  • 24
  • 44

1 Answers1

3

If you try to generate multiple data from a fixture based on another fixture you will get the yield_fixture function has more than one 'yield' error message.

One solution is to use fixture parametrization. In your case you want to iterate by columns so the Dataframe columns are the parameters.

# test data
input_df = pd.DataFrame(
    data=np.random.randint(1, 10, (5, 3)), columns=["col1", "col2", "col3"]
)


@pytest.fixture(
    scope="module",
    params=input_df.columns,
)
def input_series(request):
    series = request.param
    yield input_df[series]


def test_individual_column(input_series):
    assert len(input_series) == 5

This will generate one test by column of the test Dataframe.

pytest test_pandas.py
# test_pandas.py::test_individual_column[col1] PASSED
# test_pandas.py::test_individual_column[col2] PASSED
# test_pandas.py::test_individual_column[col3] PASSED
Romain
  • 19,910
  • 6
  • 56
  • 65
  • This is yielding the expected result. However, I have some follow-up questions: 1. Why do you use the ``yield`` keyword in this case? Using ``return`` gives exactly the same result. 2. I would like to keep ``input_df`` as a fixture, because I have other tests that need the complete dataframe as input. I could use the ``input_df`` definition from your example to simply return it as a fixture, something like this: ``def input_dataframe(): return input_df``. However, this feels a bit redundant, doesn't it? – Andi Jan 10 '22 at 10:03
  • @Andi 1. Used `yield` because it's the common pattern in fixture and just in case you would like to add some tear down code. 2. You can share the same test data between both fixtures however you will not be able to reference (use) a fixture from the yielding fixture. I've used a variable for convenience to avoid creating the DataFrame each time the fixture is used. You can refactor this code according to your constraints / usage. – Romain Jan 10 '22 at 11:46