I want to apply a function of the form (the real function has 5 parameters but let's say it has only 2)
def func(text,model):
return model[text]
to a dataframe in the following way:
model = something
df[col2]= df[col1].apply(lambda text: func(text, model)
This works fine but it is slow. This is a faster version that works fine unless the function is a lambda function.
def apply(func, data):
with Pool(cpu_count()) as pool:
return list(tqdm.tqdm(pool.imap(func, data), total=len(data)))
It throws the following error:
PicklingError: Can't pickle <function <lambda> at 0x7fe59c869e50>: attribute lookup <lambda> on __main__ failed
My solution: In order to apply this function faster I used the following trick: redefine the function so that the second parameter is default, and the value model is defined before the function is loaded.
model = something
def func(text,model=model):
return model[text]
This works fine however, I feel like this is kinda ugly. I would like to know if there are other methods to accomplish this. I also tried creating a class
class Applyer:
def __init__(self,model):
self.model = model
def func(self,text):
return model[text]
If I create an instance and then apply the function like this:
model=something
applyer = Applyer(model)
apply(applyer.func,df[col1])
this works but it's even slower than using normal apply (without multiprocessing). Those are my two attempts.