I'm calling apply()
on a pandas data frame, but it seems that the applied function is invoked twice when it's returning arrays and once when it's returning floats.
Consider the following example.
from pandas import DataFrame
from numpy.random import random
def array_or_float(flag, x):
""" Either return a random array or float depending on `flag` """
if flag:
value = random((2,1))
else:
value = random()
print('Got', round(x, 5), 'returns', value)
return value
df = DataFrame({'A values': random(3)})
df['B values'] = df.apply(lambda x: array_or_float(True, x['A values']), axis=1)
print('\nData frame:')
print(df)
If I call array_or_float(False)
inside the apply()
, i.e., if I want the function to only return floats, then the output is consistent.
Got 0.46005 returns 0.6578862349718622
Got 0.64534 returns 0.8690478424766472
Got 0.04175 returns 0.41617107157789923
Data frame:
A values B values
0 0.460050 0.657886
1 0.645342 0.869048
2 0.041752 0.416171
However, if I call array_or_float(True)
, i.e. I want to get arrays, then there seems to be an "orphaned" call that doesn't even get applied to the data frame, namely the first one.
Got 0.88822 returns [[0.31850227]
[0.66878704]]
Got 0.88822 returns [[0.70890116]
[0.9087984 ]]
Got 0.51507 returns [[0.92748729]
[0.98650649]]
Got 0.91706 returns [[0.82387122]
[0.86967768]]
Data frame:
A values B values
0 0.888216 [[0.7089011570815329], [0.9087983994394716]]
1 0.515068 [[0.92748728847228], [0.9865064881611074]]
2 0.917061 [[0.8238712182074142], [0.8696776790080818]]]
My specs are as follows:
- Python 3.6.8
- NumPy 1.15.4
- pandas 0.24.0
What's going on?