I am cleaning rss feed
data that I pulled using feedparser
. I managed to remove all special characters but I am unable to remove the "p"
from the tag <p>
. How can I remove this?
I tried this code:
def clean_text(text):
return [re.sub('[^a-z0-9]', '', w.lower()) for w in text.strip().split()]
news_df['clean_body'] = news_df['summary'].apply(clean_text)
It successfully executed this but the tag <p>
is not fully removed because the p
is remaining.