0

originally had this text df (called train)

index                     train
0        My favourite food is anything I didn't have to...
1        Now if he does off himself, everyone will thin...
2                           WHY THE FUCK IS BAYLESS ISOING
3                              To make her feel threatened
4                                   Dirty Southern Wankers

And I used this to count words in the train set:

def word_count(df):
word_count = []
for i in df['text']:
    word = i.split()
    word_count.append(len(word))
return word_count

train['word_count'] = word_count(train)

But I forgot applying pre processing. After applying pre processing in the texts, the df was like this

index                             train
0                     [favourit, food, anyth, didnt, cook]
1        [everyon, think, he, laugh, screw, peopl, inst...]
2                                     [fuck, bayless, iso]
3                                   [make, feel, threaten]
4                                [dirti, southern, wanker]

And when I try to use def word_count(df): I have an error:

AttributeError: 'list' object has no attribute 'split'

Because now I have a df with lists inside. How can I solve this?

Jamiu S.
  • 5,257
  • 5
  • 12
  • 34
Daniel_DS
  • 145
  • 1
  • 7

2 Answers2

0

You don't need that costume function, do this instead:

df['word_count'] = df['train'].apply(lambda x: len(x))
print(df)

                                             train  word_count
0             [favourit, food, anyth, didnt, cook]           5
1  [everyon, think, he, laugh, screw, peopl, inst]           7
2                             [fuck, bayless, iso]           3
3                           [make, feel, threaten]           3
4                        [dirti, southern, wanker]           3
Jamiu S.
  • 5,257
  • 5
  • 12
  • 34
0

If you already have list, use str.len():

df['word_count'] = df['train'].str.len()
print(df)

# Output
                                             train  word_count
0             [favourit, food, anyth, didnt, cook]           5
1  [everyon, think, he, laugh, screw, peopl, inst]           7
2                             [fuck, bayless, iso]           3
3                           [make, feel, threaten]           3
4                        [dirti, southern, wanker]           3
Corralien
  • 109,409
  • 8
  • 28
  • 52