2
id    marks  year 
1     18      2013
1     25      2012
3     16      2014
2     16      2013
1     19      2013
3     25      2013
2     18      2014

suppose now I group the above on id by python command.
grouped = file.groupby(file.id)

I would like to get a new file with only the row in each group with recent year that is highest of all the year in the group.

Please let me know the command, I am trying with apply but it ll only given the boolean expression. I want the entire row with latest year.

Shiva Prakash
  • 1,849
  • 4
  • 21
  • 25

1 Answers1

7

I cobbled this together using this: Python : Getting the Row which has the max value in groups using groupby

So basically we can groupby the 'id' column, then call transform on the 'year' column and create a boolean index where the year matches the max year value for each 'id':

In [103]:

df[df.groupby(['id'])['year'].transform(max) == df['year']]
Out[103]:
   id  marks  year
0   1     18  2013
2   3     16  2014
4   1     19  2013
6   2     18  2014
Community
  • 1
  • 1
EdChum
  • 376,765
  • 198
  • 813
  • 562
  • how can I get the group labels, I mean I want a list {1,2,3} in the above example I had mentioned in the quest – Shiva Prakash Jan 28 '15 at 05:10
  • @Shiva sorry are you talking about this: `df.groupby('id')['year'].max().index`? – EdChum Jan 28 '15 at 08:43
  • I am sorry I should have specified that the previous comment of mine was a different quest. I mean I want the list of all group labels stored in to a new variable – Shiva Prakash Jan 28 '15 at 10:31
  • So doesn't `labels = df.groupby('id')['year'].max().index` not do what you want? – EdChum Jan 28 '15 at 10:34
  • yup got it. Thank you but I need the entire unique group labels. I tried with labels=df.groupby('id').index but I am not getting the group labels – Shiva Prakash Jan 28 '15 at 10:47
  • @Shiva sorry I really don't understand what you want for instance `groups = df['id'],unique()` will also give you what you want without needing to perform any grouping, if you are specifically wanting a list then just cast it: `groups = list(df['id'].unique())` – EdChum Jan 28 '15 at 10:49
  • I have got a doubt in iterating over two lists I wanna clarify can I have ur email id or any sort of way to explain the problem? – Shiva Prakash Jan 28 '15 at 10:55
  • 1
    No, if you have another question then please post a new question with as much information as possible I do have a day job ;) – EdChum Jan 28 '15 at 10:57