Question about computing function on a single column

Question

I have a question about a pandas/NLTK issue.

My dataframe looks like the following:

Name    Age     Text
Anne    23     "foo you"
Joan    20     "woo you"
Marie   28     "boo you"
John    31     "moo you"
Mark    37     "loo you"

And I need to compute a new column, using the NLTK python library, that looks like the following:

Name    Age     Text        Tokens
Anne    23    "foo you"      ['foo','you']
Joan    20    "woo you"      ['woo','you']
Marie   28    "boo you"      ['boo','you']
John    31    "moo you"      ['moo','you']
Mark    37    "loo you"      ['loo','you']

I'm using the following code:

df['tokens'] = nltk.word_tokenize(df['text'])

But I get an error because It is storing one token per row, instead of all the tokens on its corresponding row.

Any help will be welcome.

Thank you very much in advance.

help-ukraine-now · Answer 1 · 2019-07-31T15:09:53.057

0

df['Tokens'] = df['Text'].str.replace('"', '').apply(nltk.word_tokenize)

    Name    Age Text        Tokens
0   Anne    23  "foo you"   ['foo', 'you']
1   Joan    20  "woo you"   ['woo', 'you']
2   Marie   28  "boo you"   ['boo', 'you']
3   John    31  "moo you"   ['moo', 'you']
4   Mark    37  "loo you"   ['loo', 'you']

edited Jul 31 '19 at 15:09

answered Jul 31 '19 at 15:02

help-ukraine-now

3,850
4
19
36

I just googled word tokenization and pandas and found [this](https://stackoverflow.com/a/44174565/10140310) answer. So yeah, just use `.apply` instead of `word_tokenize(df['text'])` – help-ukraine-now Jul 31 '19 at 15:18
The problem is that what is stored in the text column is not a string, so I need to convert it to string before – HRDSL Aug 02 '19 at 11:23
1

@HRDSL, try `df['Text'] = df['Text'].astype(str)` – help-ukraine-now Aug 02 '19 at 12:05

Question about computing function on a single column

1 Answers1