2

Assume I have df:

df = pd.DataFrame({'Day': range(1, 21),
               'Col1' : ['A', 'A', 'A', 'B', 'B', 'B', 'B', 'C', 'C', 'A', 'A', 'A', 'A', 'A', 'D', 'B', 'B', 'E', 'E', 'A']})

    Day Col1
0     1    A
1     2    A
2     3    A
3     4    B
4     5    B
5     6    B
6     7    B
7     8    C
8     9    C
9    10    A
10   11    A
11   12    A
12   13    A
13   14    A
14   15    D
15   16    B
16   17    B
17   18    E
18   19    E
19   20    A

I'm looking to get the first value of every group of repeated rows in Col1 column (not drop duplicates). Desired outcome:

    Day Col1
0     1    A
3     4    B
7     8    C
9    10    A
14   15    D
15   16    B
17   18    E
19   20    A

Please advise

Karl Knechtel
  • 62,466
  • 11
  • 102
  • 153
Agnes Lee
  • 322
  • 1
  • 12

1 Answers1

5

Use df.ne + df.shift + df.cumsum to groupby each repetitive groups then take GroupBy.first

g = df['Col1'].ne(df['Col1'].shift()).cumsum() 
df.groupby(g).first() 

      Day Col1
Col1          
1       1    A
2       4    B
3       8    C
4      10    A
5      15    D
6      16    B
7      18    E
8      20    A
Ch3steR
  • 20,090
  • 4
  • 28
  • 58