How to parallelize a non-vectorizable function (i.e., root-finding) in pandas?

Question

I have a dataframe with many rows, and having columns a, b, and c. Something like:

   |  a  |  b  |  c 
--------------------
 0 | 10.1| .01 | 3.0
 1 |  9.7| .02 | 2.0
 2 | 11.2| .03 | 1.0
...| ... | ... | ...

and a function foo(x_, a, b, c) that takes a, b, and c as parameters. I want find the root of the function for each choice of values for the parameters.

This is how I currently implement it:

from scipy.optimize import root

df.apply(lambda x: root(foo, 0.0, args=(x["a"], x["b"], x["c"])), axis=1)

The problem is that it is very slow and I would like to somehow parallelize it to speed things up. (My understanding is that apply with axis=1 simply loops through all of the rows.) What are some ways to achieve faster performance in python?

https://stackoverflow.com/questions/45545110/make-pandas-dataframe-apply-use-all-cores — Chris, Aug 31 '22 at 17:46
can you develop about `foo` and the non-vectorizable function? — Ben.T, Aug 31 '22 at 18:07
The main problem is that the `foo` function is likely a pure-Python one and calling CPython functions is very slow (because of the interpreter and also because arguments are slow dictionary internally). The only way to parallelize a code calling a pure-Python function is to create N interpreters using multiprocessing and transfer part of the dataframe using pickle which is also very slow (multithreading is not truly possible because of the GIL). Please consider not using pure-Python code. — Jérôme Richard, Aug 31 '22 at 19:28
Why not just solving one large root problem (for all rows) instead of multiple scalar ones? This approach would work in the same vein as [this answer](https://stackoverflow.com/questions/69786214/is-it-possible-to-vectorize-scipy-optimize-fminbound/69787093#69787093). — joni, Sep 01 '22 at 10:35

How to parallelize a non-vectorizable function (i.e., root-finding) in pandas?

0 Answers0