0

I have a dataframe with three columns, shown below.

What I would like to do is get the minimum date for each id & the score on that day.

id       date        score
124      2020-01-01  3.4
124      2020-01-02  3.11
124      2020-01-03  2.97
124      2020-01-04  3.64
477      2020-04-03  0.9
477      2020-04-04  0.7
477      2020-04-05  1.1
477      2020-04-08  0.5

The output I'm looking for,

id       date        score
124      2020-01-01  3.4
124      2020-04-03  0.9

I am able to get the minimum date by using the groupby but not the score to go with it. I tried the agg function but think I'm using it wrong

mHelpMe
  • 6,336
  • 24
  • 75
  • 150
  • 1
    `df.loc[df.groupby('id')['date'].idxmin()]` – anky Feb 19 '21 at 16:30
  • I'm getting a ValueError. The id in my dataframe isn't a index, does that make a difference? – mHelpMe Feb 19 '21 at 16:33
  • Nope the id I used wasnt an index either. what is the error? is your date a date or string: `df['date'] = pd.to_datetime(df['date'])` – anky Feb 19 '21 at 16:34
  • File "C:\ProgramData\Anaconda\lib\site-packages\pandas\core\groupby\groupby.py", line 655, in wrapper raise ValueError – mHelpMe Feb 19 '21 at 16:36
  • can you post the traceback? If you try `df = pd.read_clipboard()` then `df['date'] = pd.to_datetime(df['date'])` then `df.loc[df.groupby('id')['date'].idxmin()]` with the data you posted, this works – anky Feb 19 '21 at 16:38
  • ah apologies, the date column was of type object not datetime, making that change and indeed it works. Thanks for your help. If you post your answer below I can make it as correct – mHelpMe Feb 19 '21 at 16:44
  • 1
    i guessed so. thats okay , this has been asked before hence I have closed it. But I am glad I could solve your problem :) – anky Feb 19 '21 at 16:45

0 Answers0