Get first value of every sequence of repeated values in a Dataframe

Question

Assume I have df:

df = pd.DataFrame({'Day': range(1, 21),
               'Col1' : ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'A', 'A', 'A', 'A', 'A', 'D', 'B', 'B', 'E', 'E', 'A']})

    Day Col1
0     1    A
1     2    A
2     3    A
3     4    B
4     5    B
5     6    B
6     7    B
7     8    C
8     9    C
9    10    A
10   11    A
11   12    A
12   13    A
13   14    A
14   15    D
15   16    B
16   17    B
17   18    E
18   19    E
19   20    A

I'm looking to get the first value of every group of repeated rows in Col1 column (not drop duplicates). Desired outcome:

    Day Col1
0     1    A
3     4    B
7     8    C
9    10    A
14   15    D
15   16    B
17   18    E
19   20    A

Please advise

score 5 · Accepted Answer · answered Jul 28 '21 at 07:44

Use df.ne + df.shift + df.cumsum to groupby each repetitive groups then take GroupBy.first

g = df['Col1'].ne(df['Col1'].shift()).cumsum() 
df.groupby(g).first() 

      Day Col1
Col1          
1       1    A
2       4    B
3       8    C
4      10    A
5      15    D
6      16    B
7      18    E
8      20    A

Get first value of every sequence of repeated values in a Dataframe

1 Answers1