0

I have a dataframe with many rows, and having columns a, b, and c. Something like:

   |  a  |  b  |  c 
--------------------
 0 | 10.1| .01 | 3.0
 1 |  9.7| .02 | 2.0
 2 | 11.2| .03 | 1.0
...| ... | ... | ...

and a function foo(x_, a, b, c) that takes a, b, and c as parameters. I want find the root of the function for each choice of values for the parameters.

This is how I currently implement it:

from scipy.optimize import root

df.apply(lambda x: root(foo, 0.0, args=(x["a"], x["b"], x["c"])), axis=1)

The problem is that it is very slow and I would like to somehow parallelize it to speed things up. (My understanding is that apply with axis=1 simply loops through all of the rows.) What are some ways to achieve faster performance in python?

  • 1
    https://stackoverflow.com/questions/45545110/make-pandas-dataframe-apply-use-all-cores – Chris Aug 31 '22 at 17:46
  • can you develop about `foo` and the non-vectorizable function? – Ben.T Aug 31 '22 at 18:07
  • The main problem is that the `foo` function is likely a pure-Python one and calling CPython functions is very slow (because of the interpreter and also because arguments are slow dictionary internally). The only way to parallelize a code calling a pure-Python function is to create N interpreters using multiprocessing and transfer part of the dataframe using pickle which is also very slow (multithreading is not truly possible because of the GIL). Please consider not using pure-Python code. – Jérôme Richard Aug 31 '22 at 19:28
  • Why not just solving one large root problem (for all rows) instead of multiple scalar ones? This approach would work in the same vein as [this answer](https://stackoverflow.com/questions/69786214/is-it-possible-to-vectorize-scipy-optimize-fminbound/69787093#69787093). – joni Sep 01 '22 at 10:35

0 Answers0