5

I have a pandas dataframe of events, with the timestamp as the index and some scalar value (its meaning is not so important here) in the column. I would like to plot a timeseries how many events happened during any hour.

The original data (much more than displayed here) looks like this:

    size
timestamp       
2015-08-17 15:07:05.628000  50877
2015-08-17 15:07:05.701000  62989
2015-08-17 15:07:05.752000  33790
2015-08-17 15:07:05.802000  100314
2015-08-17 15:07:05.862000  10372

....

I subsequently grouped these events by hour in the following manner:

counts = df.groupby( [df.index.year, df.index.month, df.index.day, df.index.hour] ).count()

i.e. ending up with a multi-level index, with 4 levels.

But now I am struggling to create a nice graph of it. Admittedly, my pandas visualisation skills are very dodgy. I haven't gotten much further than:

counts.plot()

But this makes the x-axis completely unreadable (a sequence of tuples). I'd like the x-axis to be a proper time series that scales nicely with the resolution of the plot etc. I am doing this in IPython, in case it matters. (I guess this question may come down to how to collapse the 4 index levels into one timestamp again).

I'd happily go through some kind of reference, so feel free to point me to any useful links to read up. I looked around, but couldn't immediately find anything on the particular topic.

(Also, feel free to suggest any alternative ways to achieve what I want to do - not sure the multi-level index is the most appropriate).

Thanks!

Joris Peeters
  • 123
  • 1
  • 11

2 Answers2

1

I think what you are looking for is resample. It is designed to handle regrouping by time frames. Try:

df.resample('1H').count().plot()
James
  • 32,991
  • 4
  • 47
  • 70
0

The problem in this case is that there are multiple levels of index. You can reconcat the different levels. So doing a reindexing. A similar question can be found here.

For infromation of reindexing with multilevel index I found this. In this special case you have to recombine the levels to a datetime object

import datetime
df.index = [datetime.datetime(year, month, day, hour) for year, month, day, hour in df.index]

This gives something similar to this:

2019-10-14 19:00:00    1
2020-10-14 19:00:00    2
2020-10-14 20:00:00    2
2020-10-15 00:00:00    1
2020-10-15 05:00:00    1
thomas
  • 381
  • 2
  • 7