I have a df with 20 million records as string text df['text'] and over 100 regex to run against each record to perform replace.
This is taking too long and unfortunately I cannot use flashtext with regex.
Any advice on how to speed this up?
Below is an example of what I am doing now:
a = re.compile(u'\d{11}')
b = re.compile(u'[a-z]{1}\d{3}')
c = re.compile(u'\d{1}-[a-z]{5}-\d{1}')
for rows in df:
df['text'] = df['text'].str.replace(a,'', regex = True )
df['text'] = df['text'].str.replace(b,'', regex = True )
df['text'] = df['text'].str.replace(b,'', regex = True )