I have a question about a pandas/NLTK issue.
My dataframe looks like the following:
Name Age Text
Anne 23 "foo you"
Joan 20 "woo you"
Marie 28 "boo you"
John 31 "moo you"
Mark 37 "loo you"
And I need to compute a new column, using the NLTK python library, that looks like the following:
Name Age Text Tokens
Anne 23 "foo you" ['foo','you']
Joan 20 "woo you" ['woo','you']
Marie 28 "boo you" ['boo','you']
John 31 "moo you" ['moo','you']
Mark 37 "loo you" ['loo','you']
I'm using the following code:
df['tokens'] = nltk.word_tokenize(df['text'])
But I get an error because It is storing one token per row, instead of all the tokens on its corresponding row.
Any help will be welcome.
Thank you very much in advance.