Pandas apply() function behave differently depending on what is returned?

Question

I'm calling apply() on a pandas data frame, but it seems that the applied function is invoked twice when it's returning arrays and once when it's returning floats.

Consider the following example.

from pandas import DataFrame
from numpy.random import random

def array_or_float(flag, x):
    """ Either return a random array or float depending on `flag` """

    if flag:
        value = random((2,1))

    else:            
        value = random()

    print('Got', round(x, 5), 'returns', value)

    return value

df = DataFrame({'A values': random(3)})
df['B values'] = df.apply(lambda x: array_or_float(True, x['A values']), axis=1)

print('\nData frame:')
print(df)

If I call array_or_float(False) inside the apply(), i.e., if I want the function to only return floats, then the output is consistent.

Got 0.46005 returns 0.6578862349718622
Got 0.64534 returns 0.8690478424766472
Got 0.04175 returns 0.41617107157789923

Data frame:
   A values  B values
0  0.460050  0.657886
1  0.645342  0.869048
2  0.041752  0.416171

However, if I call array_or_float(True), i.e. I want to get arrays, then there seems to be an "orphaned" call that doesn't even get applied to the data frame, namely the first one.

Got 0.88822 returns [[0.31850227]
 [0.66878704]]
Got 0.88822 returns [[0.70890116]
 [0.9087984 ]]
Got 0.51507 returns [[0.92748729]
 [0.98650649]]
Got 0.91706 returns [[0.82387122]
 [0.86967768]]

Data frame:
   A values                                      B values
0  0.888216  [[0.7089011570815329], [0.9087983994394716]]
1  0.515068    [[0.92748728847228], [0.9865064881611074]]
2  0.917061  [[0.8238712182074142], [0.8696776790080818]]]

My specs are as follows:

Python 3.6.8
NumPy 1.15.4
pandas 0.24.0

What's going on?

This is by design, see [Notes in pandas api documentation for apply](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.apply.html) "apply calls func twice on first row / column to determine slow or fast code path" — Scott Boston, Apr 30 '19 at 13:15
Why does it happen with arrays and not with floats, though? I edited the question to focus on this difference. — Tfovid, Apr 30 '19 at 14:00

Pandas apply() function behave differently depending on what is returned?

0 Answers0