I have a Python script that consolidates reports by using Pandas all along in a sequence of DataFrame operations (drop, groupby, sum, etc). Let's say I start with a simple function that cleans all columns that has no values, it has a DataFrame as input and output:
# cei.py
def clean_table_cols(source_df: pd.DataFrame) -> pd.DataFrame:
# IMPLEMENTATION
# eg. return source_df.dropna(axis="columns", how="all")
I wanted to verify in my tests that this function actually removes all columns that all values are empty. So I arranged a test input and output, and test with assert_frame_equal
function from pandas.testing:
# test_cei.py
import pandas as pd
def test_clean_table_cols() -> None:
df = pd.DataFrame(
{
"full_valued": [1, 2, 3],
"all_missing1": [None, None, None],
"some_missing": [None, 2, 3],
"all_missing2": [None, None, None],
}
)
expected = pd.DataFrame({"full_valued": [1, 2, 3], "some_missing": [None, 2, 3]})
result = cei.clean_table_cols(df)
pd.testing.assert_frame_equal(result, expected)
My question is if it is conceptually a unit test or an e2e/integration test, since I am not mocking pandas implementation. But if I mock DataFrame, I won't be testing the functionality of the code. What is the recommended way to test this following TDD best practices?
Note: using Pandas in this project is a design decision, so there on purpose no intention to abstract Pandas interfaces to maybe replace it with other library in the future.