How do I count the words in every row of a data frame column?

Question

I am trying to count the number of words in every row of a dataframe column. Every word is separated by a comma. The name of the column is Items.

I tried achieving this by looping over every word of the dataframe row using apply and lambda. However, I am not sure how to count the number of words -

# Import pandas library
import pandas as pd

# initialize list elements
data = {'Company': ['Nike', 'Levi', 'Dell'],
        'Items': ['Shoes, Shorts, Socks', 'Jeans, Jackets', 'Laptops']}

# Create the pandas DataFrame with column name is provided explicitly
df = pd.DataFrame(data)
df['ind_words'] = df.Items.apply(lambda x: ' '.join([word for word in x.split(",")]))
df['lengths']  = df['ind_words'].count()

# print dataframe.
print(df.head())

Doing this resulted in -

  Company                 Items             ind_words  lengths
0    Nike  Shoes, Shorts, Socks  Shoes  Shorts  Socks        3
1    Levi        Jeans, Jackets        Jeans  Jackets        3
2    Dell               Laptops               Laptops        3

The column lengths is wrong. I understand why the function count() is wrong here, but I don't know what function to use.

Here is the ideal output -

  Company                 Items  length
0    Nike  Shoes, Shorts, Socks       3
1    Levi        Jeans, Jackets       2
2    Dell               Laptops       1

Maybe something along the lines of `df["Items"].str.split(",").str.len()`? — Chrysophylaxs, Apr 18 '23 at 21:57
You could also count the number of commas! `df["Items"].str.count(",") + 1` ;) — Chrysophylaxs, Apr 18 '23 at 22:09
Duplicate: [Count number of words per row](https://stackoverflow.com/q/49984905/4518341) — wjandrea, Apr 18 '23 at 22:16

score 3 · Accepted Answer · answered Apr 18 '23 at 22:03

3

You can use apply(len):

df['ind_words'] = df['Items'].str.split(',')
df['lengths']  = df['ind_words'].apply(len)

answered Apr 18 '23 at 22:03

Guru Stron

102,774
10
95
132

score 2 · Answer 2 · edited Apr 18 '23 at 21:59

2

df["col"].apply(str.split).apply(len)

edited Apr 18 '23 at 21:59

wjandrea

28,235
9
60
81

answered Apr 18 '23 at 21:58

EZLearner

1,614
16
25

There's no need to use `.apply` here since the Series has `.str.split` and `.str.len`. – wjandrea Apr 18 '23 at 22:17
FYI, OP clarified that the words are comma-separated, not whitespace-separated. – wjandrea Apr 18 '23 at 22:18

mozway · Answer 3 · 2023-04-18T22:22:11.267

1

Splitting is expensive, count the delimiters and add 1 (eventually checking if not null):

df['length'] = df['Items'].str.count(',').add(1)

Or, if you can have empty/blank strings:

df['length'] = df['Items'].str.count(',').add(~df['Items'].str.fullmatch(r'\s*'))

edited Apr 18 '23 at 22:22

answered Apr 18 '23 at 22:15

mozway

194,879
13
39
75

How do I count the words in every row of a data frame column?

3 Answers3