7

What I wish to do is create groups based on the last column (key) such that in the first group I get for example the row 1. For the second group I get rows 1 and 2. And in the 3rd group I get the last two rows.

I tried to do it with pandas groupby method, grouped = df1.groupby('Key') but as expected it returns only 2 groups.

df1 = pd.DataFrame({'A': ['A0', 'A1', 'A2', 'A3','A4'],
   ...:                     'B': ['B0', 'B1', 'B2', 'B3', 'B4'],
   ...:                     'C': ['C0', 'C1', 'C2', 'C3', 'C4'],
   ...:                     'D': ['D0', 'D1', 'D2', 'D3', 'D4'],
                            'Key': ['K0', 'K1', 'K1', 'K0', 'K0']},
   ...:                     index=[0, 1, 2, 3, 4])


df1
Out[89]: 
    A   B   C   D Key
0  A0  B0  C0  D0  K0
1  A1  B1  C1  D1  K1
2  A2  B2  C2  D2  K1
3  A3  B3  C3  D3  K0
4  A4  B4  C4  D4  K0
vampiretap
  • 351
  • 3
  • 13

1 Answers1

3

You can use shift and cumsum then groupby

grouped = df.groupby((df.Key != df.Key.shift()).cumsum())
grouped.groups

{1: Int64Index([0], dtype='int64'),
 2: Int64Index([1, 2], dtype='int64'),
 3: Int64Index([3, 4], dtype='int64')}

To show why this works:

(df.Key != df.Key.shift()).cumsum()

0    1
1    2
2    2
3    3
4    3
Name: Key, dtype: int32

Each time Key changes, the cumulative sum of the column will increase by one, even if the value has been seen before. This allows you to groupby this new series, which only groups sequential repeated keys.

user3483203
  • 50,081
  • 9
  • 65
  • 94