Would like to take a list of comments from a data frame, first parse into a list of sentences, then on a second pass, parse by word. Need this for input to word2vec model, genism.
Have already used sent_tokenize from nltk to tokenize once, but then if I try to word_tokenize after that , get have an issue because it is no longer a string and expecting a string or byte like object.
import nltk
print(df)
ID Comment
0 Today is a good day.
1 Today I went by the river. The river also flow...
2 The water by the river is blue, it also feels ...
3 Today is the last day of spring; what to do to...
df['sentences']=df['Comment'].dropna().apply(nltk.sent_tokenize)
df['word']=df['sentences'].dropna().apply(nltk.word_tokenize)
after trying to pass sentences into words TypeError: expected string or bytes-like object