0

I have a very similar question to this one, but with a twist that the first row might come up in more than one group.

I have a pandas DataFrame like following.

df = pd.DataFrame({'id' : [1,1,1,2,2,3,3,3,3,1,1,2,6,6,6,7,7],
                'value'  : ["first","second","second","first",
                            "second","first","third","fourth",
                            "fifth","second","fifth","first",
                            "first","second","third","fourth","fifth"]})

I want to group this by ["id","value"] and get the first row of each group.

        id   value
0        1   first
1        1  second
2        1  second
3        2   first
4        2  second
5        3   first
6        3   third
7        3  fourth
8        3   fifth
9        1  second
10       1   fifth
11       2   first
12       6   first
13       6  second
14       6   third
15       7  fourth
16       7   fifth

Expected outcome

    id   value
     1   first
     2   first
     3   first
     1  second
     2  first
     6  first
     7  fourth

I tried the solutions from the linked question, but it works like this:

>>> df.groupby('id').first()
     value
id        
1    first
2    first
3    first
6    first
7   fourth

Basically, the groupby pulls both groups of 1 and 2 together and only looks at the first one.

Dr Xorile
  • 967
  • 1
  • 7
  • 20
  • I'm not sure I understand the logic. We keep all "first" in value column? Can you further outline _why_ we keep particular rows? – Henry Ecker Sep 16 '21 at 16:26
  • 2
    @HenryEcker OP wants to groupby like `itertools.groupby` and wants to get first value from each of the group. With `itertools.groupy` the grouping would be `[[1,1,1] ,[ 2,2] ,[3,3,3,3] ,[1,1] ,[2], [6,6,6], [7,7]]` – Ch3steR Sep 16 '21 at 16:27
  • Every time there's a change in id column, I want to capture that row. Even if that change is to a value we've seen before. – Dr Xorile Sep 16 '21 at 16:30
  • 1
    Ah `df.groupby(df['id'].ne(df['id'].shift()).cumsum()).first()` – Henry Ecker Sep 16 '21 at 16:31
  • I think the linked question has something I can work with. The shift tool or itertools.groupby would work I think – Dr Xorile Sep 16 '21 at 16:31
  • 1
    @HenryEcker, that works perfectly! – Dr Xorile Sep 16 '21 at 16:32
  • @HenryEcker, and now that I've understood what it means, it's even more brilliant. Thank you. Wish I could give you a tick! – Dr Xorile Sep 16 '21 at 16:33

0 Answers0