0

I have a dataframe df that has thousands of rows.

For each row I want to apply function func.

As a test, I wanted to run func for only the first row of df. In func() I placed a print statement. I realized that the print statement was run 2 times even though I am slicing df to one row (there is an additional row for columns but those are columns).

When I do the following

df[0:1].apply(func, axis=1, x,y,z)

or

df.iloc[0:1,:].apply(func, axis=1, x,y,z)

The print statement is run 2 times, which means func() was executed twice.

Any idea why this is happening?

codingknob
  • 11,108
  • 25
  • 89
  • 126
  • Possible duplicate of [Why does pandas apply calculate twice](http://stackoverflow.com/questions/21635915/why-does-pandas-apply-calculate-twice) – root Apr 14 '16 at 18:18

2 Answers2

0

The doc clearly says:

In the current implementation apply calls func twice on the first column/row to decide whether it can take a fast or slow code path.

stellasia
  • 5,372
  • 4
  • 23
  • 43
0

pay attention at different slicing techniques:

In [134]: df
Out[134]:
   a  b  c
0  9  5  4
1  4  7  2
2  1  3  7
3  6  3  2
4  4  5  2

In [135]: df.iloc[0:1]
Out[135]:
   a  b  c
0  9  5  4

In [136]: df.loc[0:1]
Out[136]:
   a  b  c
0  9  5  4
1  4  7  2

with printing:

print one row as Series:

In [139]: df[0:1].apply(lambda r: print(r), axis=1)
a    9
b    5
c    4
Name: 0, dtype: int32
Out[139]:
0    None
dtype: object

or using iloc:

In [144]: df.iloc[0:1, :].apply(lambda r: print(r), axis=1)
a    9
b    5
c    4
Name: 0, dtype: int32
Out[144]:
0    None
dtype: object

print two rows/Series:

In [140]: df.loc[0:1].apply(lambda r: print(r), axis=1)
a    9
b    5
c    4
Name: 0, dtype: int32
a    4
b    7
c    2
Name: 1, dtype: int32
Out[140]:
0    None
1    None
dtype: object

OP:

"the print statement was run 2 times even though I am slicing df to one row"

actually, you were slicing it into two rows

MaxU - stand with Ukraine
  • 205,989
  • 36
  • 386
  • 419