1

I'm trying to count the number of words in two different columns and safe the result of the addition in an extra column.

Example of the data and desired result:

id  question                     answer             word_count
1   Lorem ipsum dolor sit amet   Lorem ipsum dolor  8
2   Lorem ipsum                  ipsum              3
3   Lorem ipsum dolor sit        Lorem              5

Following code is not working:

df['word_count'] = df[['question', 'answer']].apply(lambda x: len(str(x).split(" ")))
jonas
  • 392
  • 2
  • 13
  • Sum individual columns using [Count number of words per row](https://stackoverflow.com/questions/49984905/count-number-of-words-per-row?rq=1) – DarrylG Nov 10 '20 at 16:16

5 Answers5

2

Using str.len with str.split

Ex:

df = pd.read_csv(StringIO(s))
df["word_count"] = df['question'].str.split().str.len() + df['answer'].str.split().str.len()
print(df)

Output:

                     question             answer  word_count
0  Lorem ipsum dolor sit amet  Lorem ipsum dolor           8
1                 Lorem ipsum              ipsum           3
2       Lorem ipsum dolor sit              Lorem           5
Rakesh
  • 81,458
  • 17
  • 76
  • 113
1

Try this:

df['word_count'] = df.apply(
    lambda row: len(row['question'].strip().split()) + 
                len(row['answer'].strip().split()), axis=1)

In short, how it works. df.apply(func, axis=1) applies the function func for each row in the data frame and generates a series of results. .strip() is needed to cut leading and trailing spaces if they are. .split() makes a list of separate words of the string. len() calculates the number of elements of the list.

Fomalhaut
  • 8,590
  • 8
  • 51
  • 95
0

Try this:

df['word_count'] = df['question'].apply(lambda x: len(str(x).split(" ")))+df['answer'].apply(lambda x: len(str(x).split(" ")))
gaut
  • 5,771
  • 1
  • 14
  • 45
0

Split and Add two columns and then apply len:

(df['Question'].str.split(" ")+df['Answer'].str.split(" ")).apply(len)
Wasif
  • 14,755
  • 3
  • 14
  • 34
0

We can count the empty strings with str.count and sum:

df['word_count'] = (df[['question','answer']]
    .apply(lambda x: x.str.count('\s+')+1).sum(1)
)

Or we can count the word-characters \w:

df['word_count'] = (df[['question','answer']]
    .apply(lambda x: x.str.count('\w+')).sum(1)
)

Output:

   id                    question             answer  word_count
0   1  Lorem ipsum dolor sit amet  Lorem ipsum dolor           8
1   2                 Lorem ipsum              ipsum           3
2   3       Lorem ipsum dolor sit              Lorem           5
Quang Hoang
  • 146,074
  • 10
  • 56
  • 74