2

I am trying to count the number of words in every row of a dataframe column. Every word is separated by a comma. The name of the column is Items.

I tried achieving this by looping over every word of the dataframe row using apply and lambda. However, I am not sure how to count the number of words -

# Import pandas library
import pandas as pd

# initialize list elements
data = {'Company': ['Nike', 'Levi', 'Dell'],
        'Items': ['Shoes, Shorts, Socks', 'Jeans, Jackets', 'Laptops']}

# Create the pandas DataFrame with column name is provided explicitly
df = pd.DataFrame(data)
df['ind_words'] = df.Items.apply(lambda x: ' '.join([word for word in x.split(",")]))
df['lengths']  = df['ind_words'].count()

# print dataframe.
print(df.head())

Doing this resulted in -

  Company                 Items             ind_words  lengths
0    Nike  Shoes, Shorts, Socks  Shoes  Shorts  Socks        3
1    Levi        Jeans, Jackets        Jeans  Jackets        3
2    Dell               Laptops               Laptops        3

The column lengths is wrong. I understand why the function count() is wrong here, but I don't know what function to use.

Here is the ideal output -

  Company                 Items  length
0    Nike  Shoes, Shorts, Socks       3
1    Levi        Jeans, Jackets       2
2    Dell               Laptops       1
wjandrea
  • 28,235
  • 9
  • 60
  • 81
desert_ranger
  • 1,096
  • 3
  • 13
  • 26

3 Answers3

3

You can use apply(len):

df['ind_words'] = df['Items'].str.split(',')
df['lengths']  = df['ind_words'].apply(len)
Guru Stron
  • 102,774
  • 10
  • 95
  • 132
2
df["col"].apply(str.split).apply(len)
wjandrea
  • 28,235
  • 9
  • 60
  • 81
EZLearner
  • 1,614
  • 16
  • 25
1

Splitting is expensive, count the delimiters and add 1 (eventually checking if not null):

df['length'] = df['Items'].str.count(',').add(1)

Or, if you can have empty/blank strings:

df['length'] = df['Items'].str.count(',').add(~df['Items'].str.fullmatch(r'\s*'))
mozway
  • 194,879
  • 13
  • 39
  • 75