I want to filter out certain words from a pandas dataframe column and make a new column of the filtered text. I attempted the solution from here, but I think im having the issue of python thinking that I want to call the str.replace()
instead of df.replace()
. I'm not sure how to specify the latter as long as I'm calling it within a function.
df:
id old_text
0 my favorite color is blue
1 you have a dog
2 we built the house ourselves
3 i will visit you
def removeWords(txt):
words = ['i', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself']
txt = txt.replace('|'.join(words), '', regex=True)
return txt
df['new_text'] = df['old_text'].apply(removeWords)
error:
TypeError: replace() takes no keyword arguments
desired output:
id old_text new_text
0 my favorite color is blue favorite color is blue
1 you have a dog have a dog
2 we built the house ourselves built the house
3 i will visit you will visit you
other things tried:
def removeWords(txt):
words = ['i', 'my', 'myself', 'we', 'our', 'ours', 'ourselves', 'you', 'your', 'yours', 'yourself']
txt = [word for word in txt.split() if word not in words]
return txt
df['new_text'] = df['old_text'].apply(removeWords)
this returns:
id old_text new_text
0 my favorite color is blue favorite, color, is, blue
1 you have a dog have, a, dog
2 we built the house ourselves built, the, house
3 i will visit you will, visit, you