how to create groups with duplicate keys in pandas groupby?

Question

What I wish to do is create groups based on the last column (key) such that in the first group I get for example the row 1. For the second group I get rows 1 and 2. And in the 3rd group I get the last two rows.

I tried to do it with pandas groupby method, grouped = df1.groupby('Key') but as expected it returns only 2 groups.

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3','A4'],
   ...:                     'B': ['B0', 'B1', 'B2', 'B3', 'B4'],
   ...:                     'C': ['C0', 'C1', 'C2', 'C3', 'C4'],
   ...:                     'D': ['D0', 'D1', 'D2', 'D3', 'D4'],
                            'Key': ['K0', 'K1', 'K1', 'K0', 'K0']},
   ...:                     index=[0, 1, 2, 3, 4])


df1
Out[89]: 
    A   B   C   D Key
0  A0  B0  C0  D0  K0
1  A1  B1  C1  D1  K1
2  A2  B2  C2  D2  K1
3  A3  B3  C3  D3  K0
4  A4  B4  C4  D4  K0

So you want to group only by sequential repeated keys? – user3483203 Jul 24 '18 at 17:35 — user3483203, Jul 24 '18 at 17:35

user3483203 · Answer 1 · 2018-07-24T17:42:02.877

3

You can use shift and cumsum then groupby

grouped = df.groupby((df.Key != df.Key.shift()).cumsum())
grouped.groups

{1: Int64Index([0], dtype='int64'),
 2: Int64Index([1, 2], dtype='int64'),
 3: Int64Index([3, 4], dtype='int64')}

To show why this works:

(df.Key != df.Key.shift()).cumsum()

0    1
1    2
2    2
3    3
4    3
Name: Key, dtype: int32

Each time Key changes, the cumulative sum of the column will increase by one, even if the value has been seen before. This allows you to groupby this new series, which only groups sequential repeated keys.

edited Jul 24 '18 at 17:42

answered Jul 24 '18 at 17:37

user3483203

50,081
9
65
94

and any way to know which key got repeated? I mean in this case key0 got repeated. – vampiretap Jul 24 '18 at 17:48
can attribute a category to each `key` and it would appear after the groupby (e.g. if `agg` by 'first') – rafaelc Jul 24 '18 at 17:51
1

If you interested, added iterators method in the duplicate tag question :-) – BENY Jul 24 '18 at 18:13

how to create groups with duplicate keys in pandas groupby?

1 Answers1