Does the pandas df.apply(x, axis=1)
method apply the function x to all the rows simultaneously, or iteratively? I had a look in the docs but didn't find anything.
Asked
Active
Viewed 174 times
4

DataSwede
- 5,251
- 10
- 40
- 66

nguyenistheloneliestnumber
- 109
- 1
- 5
-
vectorized functions do not actually apply to all rows simultaneously. Anyway, for some details see the answer [here](http://stackoverflow.com/questions/38938318/why-apply-sometimes-isnt-faster-than-for-loop-in-pandas-dataframe/38938507#38938507). – juanpa.arrivillaga Nov 23 '16 at 05:02
2 Answers
4
It's iteratively:
In [11]: df = pd.DataFrame([[1, 2], [3, 4]], columns=["A", "B"])
In [12]: def f(row):
f.count += 1
return f.count
In [13]: f.count = 0
In [14]: df.apply(f, axis=1)
Out[14]:
0 1
1 2
dtype: int64
Note: Although in this example it doesn't seem to be the case the documentation warns:
In the current implementation apply calls func twice on the first column/row to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first column/row.
The actual for loop (for python functions rather than ufuncs) happens in lib.reduce (here).

Andy Hayden
- 359,921
- 101
- 625
- 535
0
I believe iteratively is the answer. Consider this:
import pandas as pd
import numpy as np
import time
# Make a 1000 row long dataframe
df = pd.DataFrame(np.random.random((1000, 4)))
# Apply this time delta function over the length of the dataframe
t0 = time.time()
times = df.apply(lambda _: time.time()-t0, axis=1)
# Print some of the results
print(times[::100])
Out[]:
0 0.000500
100 0.001029
200 0.001532
300 0.002036
400 0.002531
500 0.003033
600 0.003536
700 0.004035
800 0.004537
900 0.005513
dtype: float64

Alex
- 12,078
- 6
- 64
- 74