Here is a function that a apply to my dataframe
I have a csv file named '100-contacts' on my computer, and this file contains information about mails, such as first name, address, city, etc. My goal is to detect spam mails. I need to clean the data from stopwords and punctuation , this part of code would have helped me but I got a KeyError
despite existing Key.
def process_text(text):
#1 Remove puntcuation
#2 Remove stopwords
#3 Return a list of clean text words
#1
nopunc = [char for char in text if char not in string.punctuation]
nopunc = ' '.join(nopunc)
#2
clean_words = [word for word in nopunc.split() if word.lower() not in stopwords.words('english')]
#3
return clean_words
df['text'].head().apply(process_text)