my df
looks like this:
team_name text
--------- ----
red this is text from red team
blue this is text from blue team
green this is text from green team
yellow this is text from yellow team
I am trying to get this:
team_name text text_token
--------- ---- ----------
red this is text from red team 'this', 'is', 'text', 'from', 'red','team'
blue this is text from blue team 'this', 'is', 'text', 'from', 'blue','team'
green this is text from green team 'this', 'is', 'text', 'from', 'green','team'
yellow this is text from yellow team 'this', 'is', 'text', 'from', 'yellow','team'
What have I tried?
df['text_token'] = nltk.word_tokenize(df['text'])
and that does not work. How do I achieve my desired result? also is it possible to do frequency dist
?