0

Data is inconsistent.

I've tried to filter DataFrame with
df.Timestamp.dt.hour gives only hours,
df.Timestamp.dt.minute gives only minutes.

I need to filter for example every last entry of hour so 1:58, 2:54, 3:36, 4:44, etc.

I just need more efficient way, not explanation :)

  • 1
    Could you have a look at https://stackoverflow.com/questions/20109391/how-to-make-good-reproducible-pandas-examples - so you can [edit] your question into a position that's answerable with examples and your attempt code so far? – Jon Clements Aug 26 '19 at 15:11
  • I just found out, the only way is either reduce every hour to mean/min/max value or to write a function what would hold last reviewed timestamp and compare to new one - it's not efficitient way. I haven't find any func in pandas(either "python time series funcs"), but still gonna look up in R lang. Another idea to add additional hours and minutes to data.Timestamp and interpolate all of the data is bad idea too for a huge almount of data (I've got data of 2005-2019 years) – Daniel Abramov Aug 26 '19 at 15:16
  • Not necessarily true... zipa's answer is along the lines I was thinking but with this question having no sample input and output - it's impossible for anyone to test... – Jon Clements Aug 26 '19 at 15:18

1 Answers1

1

I think this should work:

df.sort_values('Date').groupby([df['Date'].dt.date, df['Date'].dt.hour], as_index=False).last()
zipa
  • 27,316
  • 6
  • 40
  • 58
  • 1
    Depending on the input - it might be necessary to force a `df.sort_values('Date')` to ensure `.last()` is actually the last – Jon Clements Aug 26 '19 at 15:24
  • @JonClements Exactly, tested on sorted data and didn't include that. Thanks! – zipa Aug 27 '19 at 07:16