Context:
I have a df like this:
title | text |
---|---|
Donald Trump Sends Out $15B | Donald Trump just couldn't wish all Americans |
Drunk Bragging Trump Staffer Started | House Intelligence Committee Chairman Devin |
... | ... |
Both title
and text
are of object datatype
I am trying to run the following code:
for i in range (0, len(msg)):
review = re.sub('[^a-zA-Z]',' ', df['title'][i])
review = review.lower()
review = review.split()
review = [ps.stem(word) for word in review if not word in stopwords.words('english')]
review = ' '.join(review)
corpus.append(review)
Error:
However, I am getting the following error on re.sub
line:
TypeError: expected string or bytes-like object
I referred to this question. But no progress. I am still getting same error.
Desired output:
>code: corpus[0:1]
>Result: [['donald trump send b'], ['drink brag trump staffer start']]
What I tried?
I tried all the possibilities from the above SO link. Also, tried changing the datatype of column by df['title'] = df['title'].astype('string')
. Getting same error :(
Additional info:
- If I use different code to replace non-alphabets and try to run, I am getting
AttributeError: 'Series' object has no attribute 'lower'
error inlower()
line - I have a different df in different notebook. This code works perfect (object being datatype)
Any help would be appreciated!