I need to replace parts of the text in a dataframe in python. However, the replacement string should be chosen from a big list of multi-word strings. I have written the following simple example for demonstration of the problem and my solution with a for loop. It works well but if the list of words and the dataframe are huge, the for loop becomes very expensive to run. I was wondering if there is any way to avoid the for loop here.
text = ['I am north and west','you are east and south']
df = pd.DataFrame(text)
def loop_names(str):
words = ['north and west','east and south']
for word in words:
str = re.sub(r'%s' %re.escape(word),'at location',str)
return(str)
df[0] = df[0].apply(loop_names)
df
# Alternatively:
text = ['I am north and west','you are east and south']
df = pd.DataFrame(text)
words = ['north and west','east and south']
for word in words:
df[0] = df[0].str.replace(r'%s' %re.escape(word),'at location')
df
# Alternatively:
text = ['I am north and west','you are east and south']
df = pd.DataFrame(text)
words = ['north and west','east and south']
for word in words:
df[0] = df[0].apply(lambda x: re.sub(r'%s' %re.escape(word),'at
location',x))
df