2

I have a dataframe:

In [73]: data = {'ID':[1234,1234,1234,1234,1235,1235,1236,1237,1237,1237,1237], 'Date':['1/4/2001','1/4/2001','6/1/2003','6/1/2003', '7/1/1998', '7/1/1998', '4/23/2005', '7/1/2005','7/1/2005','7/1/2005','7/1/2005'], 'CalcYr': [2018, 2019, 2018, 2019, 2007, 2008, 2018, 2016, 2017, 2018, 2019], 'Values':[0.1,0.1,0.2,0.3,0.3,0.4,0.6,0,0.1,0,0.2]}
In[74]: df = pd.DataFrame(data)

In [75]: df
Out[75]: 
      ID       Date  CalcYr  Values
0   1234   1/4/2001    2018     0.1
1   1234   1/4/2001    2019     0.1
2   1234   6/1/2003    2018     0.2
3   1234   6/1/2003    2019     0.3
4   1235   7/1/1998    2007     0.3
5   1235   7/1/1998    2008     0.4
6   1236  4/23/2005    2018     0.6
7   1237   7/1/2005    2016     0.0
8   1237   7/1/2005    2017     0.1
9   1237   7/1/2005    2018     0.0
10  1237   7/1/2005    2019     0.2

What I would like to do is keep only one row for the same values of ID and Date, where the CalcYr is maximum. For example, for ID of 1234, and Date of 1/4/2001, I'd only keep the row where CalcYr is 2019. The result would be:

         ID       Date  CalcYr  Values
    0   1234   1/4/2001    2019     0.1
    1   1234   6/1/2003    2019     0.3
    2   1235   7/1/1998    2008     0.4
    3   1236  4/23/2005    2018     0.6
    4   1237   7/1/2005    2019     0.2
DPdl
  • 723
  • 7
  • 23

1 Answers1

1

Use:

df.groupby(['ID','Date'], as_index=False).max()

     ID       Date  CalcYr  Values
0  1234   1/4/2001    2019     0.1
1  1234   6/1/2003    2019     0.3
2  1235   7/1/1998    2008     0.4
3  1236  4/23/2005    2018     0.6
4  1237   7/1/2005    2019     0.2
luigigi
  • 4,146
  • 1
  • 13
  • 30
  • Thanks. This seems to work, but I am curious as to how it looks for the max value int he CalcYr column and not the Values column. – DPdl Jan 15 '20 at 15:16