I have a list of 4,000 strings that I need to remove from a pandas dataframe column. The code I have below works fine for the sample I have below, but when I use it on my pandas dataframe of 20k+ rows, it takes forever. Any ideas on speeding this up?
import pandas as pd
import re
df = pd.DataFrame(
{
"ID": [1, 2, 3, 4, 5],
"name": [
"Hello Sam how is it going today? oh yeah",
"Hello Jane how is it going today? oh yeah",
"It is an Hello example how are you doing today?",
"how is it going today?n[soldjgf ",
"how is it going today Hello World",
],
}
)
my_list = ['how is it going today?n[soldjgf', 'how are you doing today?']
# =============================================================================
#
p = re.compile('|'.join(map(re.escape, my_list)))
df['cleaned_text'] = [p.sub(' ', text) for text in df['name']]