How to filter dataframe with selecting only last row of every hour?

Question

Data is inconsistent.

I've tried to filter DataFrame with
df.Timestamp.dt.hour gives only hours,
df.Timestamp.dt.minute gives only minutes.

I need to filter for example every last entry of hour so 1:58, 2:54, 3:36, 4:44, etc.

I just need more efficient way, not explanation :)

Could you have a look at https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples - so you can [edit] your question into a position that's answerable with examples and your attempt code so far? — Jon Clements, Aug 26 '19 at 15:11
I just found out, the only way is either reduce every hour to mean/min/max value or to write a function what would hold last reviewed timestamp and compare to new one - it's not efficitient way. I haven't find any func in pandas(either "python time series funcs"), but still gonna look up in R lang. Another idea to add additional hours and minutes to data.Timestamp and interpolate all of the data is bad idea too for a huge almount of data (I've got data of 2005-2019 years) — Daniel Abramov, Aug 26 '19 at 15:16
Not necessarily true... zipa's answer is along the lines I was thinking but with this question having no sample input and output - it's impossible for anyone to test... — Jon Clements, Aug 26 '19 at 15:18

zipa · Answer 1 · 2019-08-27T07:16:36.593

1

I think this should work:

df.sort_values('Date').groupby([df['Date'].dt.date, df['Date'].dt.hour], as_index=False).last()

edited Aug 27 '19 at 07:16

answered Aug 26 '19 at 15:15

zipa

1

Depending on the input - it might be necessary to force a `df.sort_values('Date')` to ensure `.last()` is actually the last – Jon Clements Aug 26 '19 at 15:24
@JonClements Exactly, tested on sorted data and didn't include that. Thanks! – zipa Aug 27 '19 at 07:16

1 Answers1