Count words in two different columns and sum them by row

Question

I'm trying to count the number of words in two different columns and safe the result of the addition in an extra column.

Example of the data and desired result:

id  question                     answer             word_count
1   Lorem ipsum dolor sit amet   Lorem ipsum dolor  8
2   Lorem ipsum                  ipsum              3
3   Lorem ipsum dolor sit        Lorem              5

Following code is not working:

df['word_count'] = df[['question', 'answer']].apply(lambda x: len(str(x).split(" ")))

Sum individual columns using [Count number of words per row](https://stackoverflow.com/questions/49984905/count-number-of-words-per-row?rq=1) — DarrylG, Nov 10 '20 at 16:16

score 2 · Accepted Answer · answered Nov 10 '20 at 16:10

2

Using str.len with str.split

Ex:

df = pd.read_csv(StringIO(s))
df["word_count"] = df['question'].str.split().str.len() + df['answer'].str.split().str.len()
print(df)

Output:

                     question             answer  word_count
0  Lorem ipsum dolor sit amet  Lorem ipsum dolor           8
1                 Lorem ipsum              ipsum           3
2       Lorem ipsum dolor sit              Lorem           5

answered Nov 10 '20 at 16:10

Rakesh

81,458
17
76
113

2

i prefer this answer because it is not using .apply() – Sander van den Oord Nov 10 '20 at 16:15

Fomalhaut · Answer 2 · 2020-11-10T16:14:18.843

Try this:

df['word_count'] = df.apply(
    lambda row: len(row['question'].strip().split()) + 
                len(row['answer'].strip().split()), axis=1)

In short, how it works. df.apply(func, axis=1) applies the function func for each row in the data frame and generates a series of results. .strip() is needed to cut leading and trailing spaces if they are. .split() makes a list of separate words of the string. len() calculates the number of elements of the list.

score 0 · Answer 3 · answered Nov 10 '20 at 16:14

0

Try this:

df['word_count'] = df['question'].apply(lambda x: len(str(x).split(" ")))+df['answer'].apply(lambda x: len(str(x).split(" ")))

answered Nov 10 '20 at 16:14

gaut

5,771
1
14
45

score 0 · Answer 4 · answered Nov 10 '20 at 16:14

0

Split and Add two columns and then apply len:

(df['Question'].str.split(" ")+df['Answer'].str.split(" ")).apply(len)

answered Nov 10 '20 at 16:14

Wasif

14,755
3
14
34

score 0 · Answer 5 · answered Nov 10 '20 at 16:21

We can count the empty strings with str.count and sum:

df['word_count'] = (df[['question','answer']]
    .apply(lambda x: x.str.count('\s+')+1).sum(1)
)

Or we can count the word-characters \w:

df['word_count'] = (df[['question','answer']]
    .apply(lambda x: x.str.count('\w+')).sum(1)
)

Output:

   id                    question             answer  word_count
0   1  Lorem ipsum dolor sit amet  Lorem ipsum dolor           8
1   2                 Lorem ipsum              ipsum           3
2   3       Lorem ipsum dolor sit              Lorem           5

Count words in two different columns and sum them by row

5 Answers5