0

How can I filter this Pandas DataFrame so that I only get "VALUE" and "DATE" relating to the most recent "PUBLISH_TIME"?

Current DataFrame:

PUBLISH_TIME            DATE        VALUE
0   2022-01-04 00:52:30 2022-01-04  490
1   2022-01-04 10:01:45 2022-01-04  503
2   2022-01-04 10:09:04 2022-01-04  504
3   2022-01-05 00:44:16 2022-01-05  513
4   2022-01-05 10:18:44 2022-01-05  527
5   2022-01-06 01:03:16 2022-01-06  527
6   2022-01-06 10:09:08 2022-01-06  520

What I want:

DATE        VALUE
2022-01-04  490
2022-01-05  513
2022-01-05  527

Code:

import pandas as pd

x = {
    "PUBLISH_TIME": [
        "2022-01-04 00:52:30", 
        "2022-01-04 10:01:45",
        "2022-01-04 10:09:04",
        "2022-01-05 00:44:16",
        "2022-01-05 10:18:44",
        "2022-01-06 01:03:16",
        "2022-01-06 10:09:08"
        ],
     "DATE": [
         "2022-01-04",
         "2022-01-04",
         "2022-01-04",
         "2022-01-05",
         "2022-01-05",
         "2022-01-06",
         "2022-01-06"
     ],
     "VALUE": [
         490,
         503,
         504,
         513,
         527,
         527,
         520
     ]
}  
     
df = pd.DataFrame(x)

I have tried df.groupby('PUBLISH_TIME').apply(max)[['DATE', 'VALUE']] however this does not return the desired result.

cmp
  • 568
  • 3
  • 16
  • 1
    You want `df['PUBLISH_TIME'] = pd.to_datetime(df['PUBLISH_TIME']) ; df.loc[df.groupby('DATE')['PUBLISH_TIME'].idxmin(), ['DATE', 'VALUE']]` (to match the provided output) – mozway Mar 22 '23 at 09:48
  • 1
    Note that you said "most recent" but your output shows the oldest date, you might want to consider `idxmax` if needed. – mozway Mar 22 '23 at 09:49
  • 1
    This works perfectly, thank you very much for the prompt response. – cmp Mar 22 '23 at 09:54

0 Answers0