3

My question is similar to this one but the answer seems to work not entirely!!

merge rows pandas dataframe based on condition

Given the following pandas Dataframe:

+---------+-----------------+-----------------+
| SECTION | TEXT            | NUMBER_OF_WORDS |
+---------+-----------------+-----------------+
| ONE     | lots   of text… | 55              |
+---------+-----------------+-----------------+
| ONE     | word1           | 1               |
+---------+-----------------+-----------------+
| ONE     | lots   of text… | 151             |
+---------+-----------------+-----------------+
| ONE     | word2           | 1               |
+---------+-----------------+-----------------+
| ONE     | word3           | 1               |
+---------+-----------------+-----------------+
| ONE     | word4           | 1               |
+---------+-----------------+-----------------+
| TWO     | lots   of text… | 523             |
+---------+-----------------+-----------------+
| TWO     | lots   of text… | 123             |
+---------+-----------------+-----------------+
| TWO     | word4           | 1               |
+---------+-----------------+-----------------+

If the value in the NUMBER_OF_WORDS column is 1; it has to be combined with the above row; provided they have the same SECTION value.

Thus the final result should be like this:

+---------+--------------------------------------+-----------------+
| SECTION | TEXT                                 | NUMBER_OF_WORDS |
+---------+--------------------------------------+-----------------+
| ONE     | lots   of text…, word1               | 56              |
+---------+--------------------------------------+-----------------+
| ONE     | lots   of text…, word2, word3, word4 | 154             |
+---------+--------------------------------------+-----------------+
| TWO     | lots   of text…                      | 523             |
+---------+--------------------------------------+-----------------+
| TWO     | lots   of text…, word4               | 124             |
+---------+--------------------------------------+-----------------+

Here is the code; which seems to work but not as I want it.

df.groupby(['SECTION', (df.NUMBER_OF_WORDS.shift(1) == 1)], as_index=False, sort=False).agg({'TEXT': lambda x: ', '.join(x), 'NUMBER_OF_WORDS': lambda x: sum(x)})

UPDATE

This is the answer from BEN_YO; but he seems to have a minor typo. for the sake of making this question answered for future users I am going to include his answer slightly modified.

s = df['NUMBER_OF_WORDS'].ne(1).cumsum()
out = df.groupby(s).agg({'SECTION': 'first','TEXT': lambda x: ', '.join(x),'NUMBER_OF_WORDS': lambda x: sum(x)})

Akbar Hussein
  • 352
  • 3
  • 13

1 Answers1

2

Let us try groupby with cumsum

s = df['NUMBER_OF_WORDS'].ne(1).cumsum()
out = df.groupby(s).agg({'SECTION':'first','TEXT':','.join,'NUMBER_OF_WORDS':'sum'})
BENY
  • 317,841
  • 20
  • 164
  • 234