Iterating over an array multithreaded

Question

I have some difficulties finding out how to write the following code multi threaded. I guess its just a syntax thing how to write it probably.

What I want is to process every col parallel, every col has their vfc - array in the same data object.

Thanks in advance

        with multiprocessing.Pool() as pool:

            for col in list_column_names:
                # returns an array
                vfc = self.get_vfc(col)

                data[vfc] = data[col].apply(lambda x: self.smth(x, self.model))

ctenar · Answer 1 · 2020-04-03T15:43:36.150

0

You are actually not using the pool you create. Also, working with lambda expressions in multiprocessing can be problematic, cf. Python Multiprocessing Pool Map: AttributeError: Can't pickle local object You could try something like this:

with multiprocessing.Pool() as pool:      
   pool.map(self.cf, [data[c] for c in data])

where the function cf is defined at class level and contains the logic intended by the use of apply and your lambda. Possibly also define your dataframe at class level.

edited Apr 03 '20 at 15:43

answered Apr 03 '20 at 15:34

ctenar

718
5
24

yeah had this lambda issue before :/ so the dataframe as variable is not possible? – Zarathustra Apr 06 '20 at 07:38
Not sure what you mean by variable, but you could define/compute the dataframe ```data``` before the multiprocessing operation. For instance, something like this would work ``` with multiprocessing.Pool() as pool: new_data = pool.map(self.cf, [data[c] for c in data]) new_keys = pool.map(self.get_vfc, [data[c] for c in data]) for i, k in enumerate(new_keys): data[k] = new_data[i] ``` (sorry about the poor formatting, block code doesn't work in comments...) – ctenar Apr 07 '20 at 06:45

Iterating over an array multithreaded

1 Answers1