1

I have DataFrame similar to the below (this is just a sample):

i   TIME    CITIES_LABEL    Value   lat_rounded long

2   2005    Tilburg         22 250  52.070498   4.300700 
3   2005    Amsterdam       45 825  52.370216   4.895168
4   2005    Rotterdam       27 600  51.924420   4.477733
5   2005    Utrecht         12 915  52.090737   5.121420
6   2005    Eindhoven       9 165   51.441642   5.469722
7   2006    Tilburg         7 800   51.560596   5.091914
8   2005    Groningen       7 620   53.219383   6.566502
9   2005    Enschede        6 250   52.221537   6.893662
10  2005    Arnhem          6 025   51.985103   5.898730
11  2006    Utrecht         3 400   50.888174   5.979499
12  2006    Amsterdam       6 795   52.350785   5.264702
13  2005    Breda           8 565   51.571915   4.768323
14  2010    Groningen       6 325   51.812563   5.837226
15  2005    Apeldoorn       7 005   52.211157   5.969923
16  2007    Utrecht         3 785   53.201233   5.799913
17  2006    Rotterdam       7 130   52.387388   4.646219
18  2005    Zaanstad        6 060   52.457966   4.751042
19  2008    Tilburg         6 945   51.697816   5.303675
20  2007    Amsterdam       5 840   52.156111   5.387827
21  2005    Maastricht      5 220   50.851368   5.690972

Cities are repeated along the CITIES_LABEL field. I would like to filter the cities based on their highest TIME value. An example of the output I would like is:

i   TIME    CITIES_LABEL    Value   lat_rounded long

6   2005    Eindhoven       9 165   51.441642   5.469722
9   2005    Enschede        6 250   52.221537   6.893662
10  2005    Arnhem          6 025   51.985103   5.898730
13  2005    Breda           8 565   51.571915   4.768323
14  2010    Groningen       6 325   51.812563   5.837226
15  2005    Apeldoorn       7 005   52.211157   5.969923
16  2007    Utrecht         3 785   53.201233   5.799913
17  2006    Rotterdam       7 130   52.387388   4.646219
18  2005    Zaanstad        6 060   52.457966   4.751042
19  2008    Tilburg         6 945   51.697816   5.303675
20  2007    Amsterdam       5 840   52.156111   5.387827
21  2005    Maastricht      5 220   50.851368   5.690972

Any thoughts on how best to approach this issue in pandas?

EDIT

my question is different from Python : How can I get Rows which have the max value of the group to which they belong? because I am looking for a filter for both TIME and CITIES_LABEL while the previous question is only looking at filtering based to a (maximum) value of one field, and it does not care for duplicates in other fields

piRSquared
  • 285,575
  • 57
  • 475
  • 624
Valerio D. Ciotti
  • 1,369
  • 2
  • 17
  • 27
  • Possible duplicate of [Python : How can I get Rows which have the max value of the group to which they belong?](http://stackoverflow.com/questions/15707746/python-how-can-i-get-rows-which-have-the-max-value-of-the-group-to-which-they) – ASGM Oct 27 '16 at 16:36
  • @ASGM my question is different from the two that you suggested because I am looking for a filter for both `TIME` and `CITIES_LABEL` while the previous questions are only looking at filtering based to a (maximum) value of one field, and it does not care for duplicates in other fields. – Valerio D. Ciotti Oct 27 '16 at 16:41
  • @ASGM I did not add what I tried because I did not use any pandas methods to solve it and I am looking for a better (simpler) approach using pandas – Valerio D. Ciotti Oct 27 '16 at 16:43

1 Answers1

5

use groupby and idxmax

df.ix[df.groupby('CITIES_LABEL').TIME.idxmax()]

enter image description here

piRSquared
  • 285,575
  • 57
  • 475
  • 624