0

I am trying to find a way to parallelise certain operations on dataframes, especially those that cannot be vectorised. I have tested the code below, taken from http://www.racketracer.com/2016/07/06/pandas-in-parallel/ , but it doesn't work. No error message - quite simply, nothing happens. Debugging it, it seems the code gets stuck at df = pd.concat(pool.map(func, df_split)) , but without any error messages.

What am I doing wrong?

import timeit
import pandas as pd
import numpy as np
import seaborn as sns
import multiprocessing
from multiprocessing import Pool

def parallelize_dataframe(df, func):
    df_split = np.array_split(df, num_partitions)
    pool = Pool(num_cores)
    df = pd.concat(pool.map(func, df_split))
    pool.close()
    pool.join()
    return df

def multiply_columns(data):
    data['length_of_word'] = data['species'].apply(lambda x: len(x))
    return data

num_partitions = 2 #number of partitions to split dataframe
num_cores = 2# multiprocessing.cpu_count() #number of cores on your machine

iris = pd.DataFrame(sns.load_dataset('iris'))
iris = parallelize_dataframe(iris, multiply_columns)
Pythonista anonymous
  • 8,140
  • 20
  • 70
  • 112

1 Answers1

0

I needed to add

if __name__ == "__main__":
Pythonista anonymous
  • 8,140
  • 20
  • 70
  • 112
  • Please use the edit link on your question to add additional information. The Post Answer button should be used only for complete answers to the question. - [From Review](/review/low-quality-posts/22519096) – Amsakanna Mar 20 '19 at 11:02
  • I am not following. The complete answer to the question is that parallelize(dataframe) must be run only if __name=="__main__" , which is what I have written. I could have made it more explicit, but it seemed pretty obvious to me – Pythonista anonymous Mar 20 '19 at 11:04
  • 1
    Please include the lines before and after this `if`-statement you want to include. _(Always think of your posts here as entries in a knowledge base, not just a chat)_ – Dirk Horsten Mar 20 '19 at 11:10
  • This looked more like a comment saying that you forgot to add something to your question. Anyways cheers! – Amsakanna Mar 20 '19 at 11:22