0

I've been playing with the apply function to run functions on each row of my data..it's cool and seems faster than the for loops I was using. I'm now thinking about how I can speed it up so I'm wondering how can I use apply in parallel.

In [49]: df
Out[49]: 
          0         1
0  1.000000  0.000000
1 -0.494375  0.570994
2  1.000000  0.000000
3  1.876360 -0.229738
4  1.000000  0.000000

In [50]: def f(x):    
   ....:  return x[0] + x[1]  
   ....:  

In [51]: df.apply(f, axis=1) #passes a Series object, row-wise
Out[51]: 
0    1.000000
1    0.076619
2    1.000000
3    1.646622
4    1.000000

with this example, do I need to wrap my function or the apply method with import concurrent.futures or something similar?

Lostsoul
  • 25,013
  • 48
  • 144
  • 239
  • `apply` is not really faster than `for` loop, if anything, it's slower due to all the overhead. Your question is almost identical to asking how to vectorize some certain function, which is way too broad, if possible at all, for a SO question. – Quang Hoang Oct 20 '20 at 18:23
  • 1
    Quite a few options like `dask`, `pandarallel`, `swifter`. Have a look at [`this`](https://stackoverflow.com/questions/45545110/make-pandas-dataframe-apply-use-all-cores). – Mayank Porwal Oct 20 '20 at 18:24
  • Does this answer your question? [pandas multiprocessing apply](https://stackoverflow.com/questions/26784164/pandas-multiprocessing-apply) – Michael Szczesny Oct 20 '20 at 18:25
  • Also [this question](https://stackoverflow.com/questions/16476924/how-to-iterate-over-rows-in-a-dataframe-in-pandas/55557758#55557758) on iterrows. – Quang Hoang Oct 20 '20 at 18:27

0 Answers0