0

my df looks like this:

team_name   text
---------   ----
red         this is text from red team
blue        this is text from blue team
green       this is text from green team
yellow      this is text from yellow team

I am trying to get this:

team_name   text                             text_token
---------   ----                             ----------
red         this is text from red team       'this', 'is', 'text', 'from', 'red','team'
blue        this is text from blue team      'this', 'is', 'text', 'from', 'blue','team'
green       this is text from green team     'this', 'is', 'text', 'from', 'green','team'
yellow      this is text from yellow team    'this', 'is', 'text', 'from', 'yellow','team'

What have I tried?

df['text_token'] = nltk.word_tokenize(df['text'])

and that does not work. How do I achieve my desired result? also is it possible to do frequency dist?

floss
  • 2,603
  • 2
  • 20
  • 37
  • https://stackoverflow.com/questions/44173624/how-to-apply-nltk-word-tokenize-library-on-a-pandas-dataframe-for-twitter-data. and. https://stackoverflow.com/questions/33098040/how-to-use-word-tokenize-in-data-frame – Joe Ferndz Jan 03 '21 at 02:48
  • `df['text_token'] = df.apply(lambda row: nltk.word_tokenize(row['text']), axis=1)` – Joe Ferndz Jan 03 '21 at 02:50
  • 2
    Does this answer your question? [how to use word\_tokenize in data frame](https://stackoverflow.com/questions/33098040/how-to-use-word-tokenize-in-data-frame) – Lydia van Dyke Jan 03 '21 at 09:27

1 Answers1

1

Stack overflow has a few examples for you to look into.

This has been solved in link : how to use word_tokenize in data frame

df['text_token'] = df.apply(lambda row: nltk.word_tokenize(row['text']), axis=1)
Joe Ferndz
  • 8,417
  • 2
  • 13
  • 33