I am trying to remove three sentences from paragraphs of text data. I have a pandas dataframe with rows of paragraphs that I want to remove the same three sentences from. For example,
import pandas as pd
df_1 = pd.DataFrame({"text": ["the dog is red. He goes outside and runs.",
"i like dogs because they are fun. i don't like that dogs bark at mailmen",
"dogs bark at mailmen and i think its funny."]})
custom_stopwords = ["the dog is red", "i like dogs", "dogs bark at mailmen"]
for i in custom_stopwords:
df_1['text'] = df_1['text'].str.replace(i, '')
This method is working in this example I have provided, but it does not work on my actual data. The data I have is quite large, but I don't see why that would matter in this case. What is happening is some of my sentences will be removed and others will not. For example, I am unable to remove the word "installation(s)" without blocking out the parentheses with "/".