I have the following functions
def main():
(
pd.DataFrame({'a': [1, 2, float('NaN')], 'b': [1.0, 2, 3]})
.dropna(subset=['a'])
.assign(
b=lambda x: x['b'] * 2
)
.apply(do_something_with_each_row, axis='columns')
)
def do_something_with_each_row(one_row):
# do_something_with_row
print(one_row)
In my test, I want to look at the dataframe built after all chained operations and check if everything is fine with it before calling do_something_with_each_row
. This last function does not return a dataframe (it just iterates over all rows similarly to iterrow
).
I tried to mock the apply
function like this:
# need pytest-mock and pytest
import pandas as pd
def test_not_working(mocker):
mocked_apply = mocker.patch.object(pd.Dataframe, 'apply')
main()
but in this case, I don't get the access to the dataframe which is input to apply
to test its content.
I also tried to mock the do_something_with_each_row
:
# need pytest-mock and pytest
import pandas as pd
def test_not_working_again(mocker):
mocked_to_something = mocker.patch('path.to.file.do_something_with_each_row')
main()
but this time I have all the calls with row arguments but they all have None
values.
How could I get the dataframe for which apply
function is called and check that it is indeed same as the following:
pd.Dataframe({'a': [1, 2], 'b': [2.0, 4]})
I am working with the 0.24.2
pandas version, an upgrade to pandas 1.0.5
does not change the matter.
I tried search in pandas issues but didn't find anything about this subject.