I have dataframe with two columns:
col1 col2
"aaa bbb" some_regex_str1
"zzz aaa" some_regex_str2
"sda343das" some_regex_str3
...
"999 aaa dsd" some_regex_strN
the length of the dataframe can be anything between 10^6 - 10^7.
Currently,
I do:
df['output'] = df.apply(lambda row: re.search(row['col2'], row['col1'], axis=1)
It is slow.
What is the more efficient way to do it?
EDIT:
I have created yo.py module with
import re
def run_regex(x):
return re.search(x['col2'], x['col1'])
in main module I do:
from yo import run_regex
...
res = df.parallel_apply(run_regex)
but I still get
AttributeError: Can't pickle local object 'prepare_worker.<locals>.closure.<locals>.wrapper'