0

I have a data frame

df = pd.DataFrame({'Col1':['a','a','a','b','b','b','c','c','c'],
                  'Col2':[ '2020-12-01','2020-11-05','2019-01-05','2017-08-07',
                          '2018-07-16','2018-12-31','2019-11-30','2018-05-15',
                          '2019-01-15']})

I want the unique values in Col1 to be with most recent date Col2. The required column would be like this

    Col1  Col2 
0   a     2020-12-01
1   b     2018-12-31
2   c     2019-11-30
Ch3steR
  • 20,090
  • 4
  • 28
  • 58
raheel
  • 164
  • 7

2 Answers2

2

First, convert Col2 as datetime using pd.to_datetime if it's not datetime already, then use GroupBy.max

df['Col2'] = pd.to_datetime(df['Col2'])
df.groupby('Col1').max()

#            Col2
# Col1
# a    2020-12-01
# b    2018-12-31
# c    2019-11-30
Ch3steR
  • 20,090
  • 4
  • 28
  • 58
2

Sort the dataframe based on Col2 first and then groupby and agg last

Do:

df.sort_values(by='Col2').groupby('Col1').last()
Suhas Mucherla
  • 1,383
  • 1
  • 5
  • 17