Handling Datetimes and time series in python and pandas often comes with some real problems and bizarre design choices.
Very often, you want to plot or handle a time series without the date component (i.e. just the clock time). However, matplotlib can't plot arrays of datetime.time
, and you can't subtract one datetime.time
from another. I think that's a big miss, but whatever.
Anyway, to get around that, most sources (here included) suggest that if you want to plot things by time-of-day, you just use datetime
-like objects. E.g. if I have a time series spanning several days, I need to just somehow make sure they look like they're all on the same day.
Examples of what I mean:
Can't subtract datetime.time
, need datetime.datetime
objects: subtract two times in python
Plotting by quantities by time in matplotlib (see answer 1): Plotting time in Python with Matplotlib
The problem:
Using pd.__version__
1.1.5:
a = pd.date_range("2021-06-01", "2021-06-04", freq="30min").tz_localize("CET")
t0 = dt.datetime.combine(a[0].date(), dt.time(0, 0), tzinfo=pytz.timezone("CET"))
We have t0
datetime.datetime(2021, 6, 1, 0, 0, tzinfo=<DstTzInfo 'CET' CET+1:00:00 STD>)
and a
DatetimeIndex(['2021-06-01 00:00:00+02:00', '2021-06-01 00:30:00+02:00',
'2021-06-01 01:00:00+02:00', '2021-06-01 01:30:00+02:00',
'2021-06-01 02:00:00+02:00', '2021-06-01 02:30:00+02:00',
'2021-06-01 03:00:00+02:00', '2021-06-01 03:30:00+02:00',
'2021-06-01 04:00:00+02:00', '2021-06-01 04:30:00+02:00',
...
'2021-06-03 19:30:00+02:00', '2021-06-03 20:00:00+02:00',
'2021-06-03 20:30:00+02:00', '2021-06-03 21:00:00+02:00',
'2021-06-03 21:30:00+02:00', '2021-06-03 22:00:00+02:00',
'2021-06-03 22:30:00+02:00', '2021-06-03 23:00:00+02:00',
'2021-06-03 23:30:00+02:00', '2021-06-04 00:00:00+02:00'],
dtype='datetime64[ns, CET]', length=145, freq=None)
However, when I do a - t0
TimedeltaIndex(['-1 days +23:00:00', '-1 days +23:30:00', '0 days 00:00:00',
'0 days 00:30:00', '0 days 01:00:00', '0 days 01:30:00',
'0 days 02:00:00', '0 days 02:30:00', '0 days 03:00:00',
'0 days 03:30:00',
...
'2 days 18:30:00', '2 days 19:00:00', '2 days 19:30:00',
'2 days 20:00:00', '2 days 20:30:00', '2 days 21:00:00',
'2 days 21:30:00', '2 days 22:00:00', '2 days 22:30:00',
'2 days 23:00:00'],
dtype='timedelta64[ns]', length=145, freq=None)
The timedelta should be 0 in the first element and increasing from there. Is this a bug or am I doing something wrong? One of the two objects doesn't seem to be aware of summer time in CET, and one does. What is the general way to get the correct timedeltas.
More importantly, once and for all, if I have a pandas
time series spanning many days, timezone-aware or not, and I want to:
Plot the values in the time series by the time of day they appear in.
Get them in any form that allows me to actually perform the same calculations I can do with datetime objects; i.e adding, subtracting, or comparing times like a normal person.
what is the recommended way forward (avoiding issues like the above)?
Update in light of comments: it's been mentioned that pytz
doesn't recommend timezones other than UTC, but then what is the point? Regardless, I have timezone-aware data, and I need a way to do the above. Converting to UTC isn't useful now:
a.tz_convert("UTC")
gives
DatetimeIndex(['2021-05-31 22:00:00+00:00', '2021-05-31 22:30:00+00:00',
'2021-05-31 23:00:00+00:00', '2021-05-31 23:30:00+00:00',
'2021-06-01 00:00:00+00:00', '2021-06-01 00:30:00+00:00',
'2021-06-01 01:00:00+00:00', '2021-06-01 01:30:00+00:00',
'2021-06-01 02:00:00+00:00', '2021-06-01 02:30:00+00:00',
...
'2021-06-03 17:30:00+00:00', '2021-06-03 18:00:00+00:00',
'2021-06-03 18:30:00+00:00', '2021-06-03 19:00:00+00:00',
'2021-06-03 19:30:00+00:00', '2021-06-03 20:00:00+00:00',
'2021-06-03 20:30:00+00:00', '2021-06-03 21:00:00+00:00',
'2021-06-03 21:30:00+00:00', '2021-06-03 22:00:00+00:00'],
dtype='datetime64[ns, UTC]', length=145, freq=None)
The issue is I need to plot the clock-time in CET, not the clock-time in UTC+0. Basically I need to place the first element at 0 on a plot, and the rest where a human would logically put them based on a clock time.
Ideally, I could do a.time
and do arithmetic and plotting with that, but someone decided it was not to be so I need a way around.