4

Does the pandas df.apply(x, axis=1) method apply the function x to all the rows simultaneously, or iteratively? I had a look in the docs but didn't find anything.

DataSwede
  • 5,251
  • 10
  • 40
  • 66
  • vectorized functions do not actually apply to all rows simultaneously. Anyway, for some details see the answer [here](http://stackoverflow.com/questions/38938318/why-apply-sometimes-isnt-faster-than-for-loop-in-pandas-dataframe/38938507#38938507). – juanpa.arrivillaga Nov 23 '16 at 05:02

2 Answers2

4

It's iteratively:

In [11]: df = pd.DataFrame([[1, 2], [3, 4]], columns=["A", "B"])

In [12]: def f(row):
             f.count += 1
             return f.count

In [13]: f.count = 0

In [14]: df.apply(f, axis=1)
Out[14]:
0    1
1    2
dtype: int64

Note: Although in this example it doesn't seem to be the case the documentation warns:

In the current implementation apply calls func twice on the first column/row to decide whether it can take a fast or slow code path. This can lead to unexpected behavior if func has side-effects, as they will take effect twice for the first column/row.

The actual for loop (for python functions rather than ufuncs) happens in lib.reduce (here).

Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
0

I believe iteratively is the answer. Consider this:

import pandas as pd
import numpy as np
import time

# Make a 1000 row long dataframe
df = pd.DataFrame(np.random.random((1000, 4)))

# Apply this time delta function over the length of the dataframe
t0 = time.time()
times = df.apply(lambda _: time.time()-t0, axis=1)

# Print some of the results
print(times[::100])

Out[]:
0 0.000500
100 0.001029
200 0.001532
300 0.002036
400 0.002531
500 0.003033
600 0.003536
700 0.004035
800 0.004537
900 0.005513
dtype: float64

Alex
  • 12,078
  • 6
  • 64
  • 74