1

This is what I have (this example is simplified version of my original data ):

data = {'1': [22, 2, '2018-06-01'], 
        '2': [24, 2, '2018-06-30'], 
        '3': [28, 5, '2018-06-22'],
        '4': [23, 4, '2018-06-02'],
        '5': [23, 6, '2018-06-12'],
        '6': [23, 2, '2018-06-22'],
        '7': [22, 2, '2018-06-11']}
df = pd.DataFrame.from_dict(data, orient='index', columns=['ad_id', 'price', 'date'])
df

print of df

ad_id   price   date
1   22  2   2018-06-01
2   24  2   2018-06-30
3   28  5   2018-06-22
4   23  4   2018-06-02
5   23  6   2018-06-12
6   23  2   2018-06-22
7   22  2   2018-06-11

I want to get:

2   24  2   2018-06-30
3   28  5   2018-06-22
6   23  2   2018-06-22
7   22  2   2018-06-11

If there is only one value for ad_id than the return is that row (because there is only one).
If there are multiple/duplicated values for ad_id than return row with highest date value

WebOrCode
  • 6,852
  • 9
  • 43
  • 70
  • 1
    You can use `df['date'] = pd.to_datetime(df['date']) df1 = df.loc[df.groupby('ad_id')['date'].idxmax()]` – jezrael Jul 24 '18 at 09:18

0 Answers0