0

Handling Datetimes and time series in python and pandas often comes with some real problems and bizarre design choices.

Very often, you want to plot or handle a time series without the date component (i.e. just the clock time). However, matplotlib can't plot arrays of datetime.time, and you can't subtract one datetime.time from another. I think that's a big miss, but whatever.

Anyway, to get around that, most sources (here included) suggest that if you want to plot things by time-of-day, you just use datetime-like objects. E.g. if I have a time series spanning several days, I need to just somehow make sure they look like they're all on the same day.

Examples of what I mean:

Can't subtract datetime.time, need datetime.datetime objects: subtract two times in python

Plotting by quantities by time in matplotlib (see answer 1): Plotting time in Python with Matplotlib

The problem:

Using pd.__version__ 1.1.5:

a = pd.date_range("2021-06-01", "2021-06-04", freq="30min").tz_localize("CET")
t0 = dt.datetime.combine(a[0].date(), dt.time(0, 0), tzinfo=pytz.timezone("CET"))

We have t0

datetime.datetime(2021, 6, 1, 0, 0, tzinfo=<DstTzInfo 'CET' CET+1:00:00 STD>)

and a

DatetimeIndex(['2021-06-01 00:00:00+02:00', '2021-06-01 00:30:00+02:00',
               '2021-06-01 01:00:00+02:00', '2021-06-01 01:30:00+02:00',
               '2021-06-01 02:00:00+02:00', '2021-06-01 02:30:00+02:00',
               '2021-06-01 03:00:00+02:00', '2021-06-01 03:30:00+02:00',
               '2021-06-01 04:00:00+02:00', '2021-06-01 04:30:00+02:00',
               ...
               '2021-06-03 19:30:00+02:00', '2021-06-03 20:00:00+02:00',
               '2021-06-03 20:30:00+02:00', '2021-06-03 21:00:00+02:00',
               '2021-06-03 21:30:00+02:00', '2021-06-03 22:00:00+02:00',
               '2021-06-03 22:30:00+02:00', '2021-06-03 23:00:00+02:00',
               '2021-06-03 23:30:00+02:00', '2021-06-04 00:00:00+02:00'],
              dtype='datetime64[ns, CET]', length=145, freq=None)

However, when I do a - t0

TimedeltaIndex(['-1 days +23:00:00', '-1 days +23:30:00',   '0 days 00:00:00',
                  '0 days 00:30:00',   '0 days 01:00:00',   '0 days 01:30:00',
                  '0 days 02:00:00',   '0 days 02:30:00',   '0 days 03:00:00',
                  '0 days 03:30:00',
                ...
                  '2 days 18:30:00',   '2 days 19:00:00',   '2 days 19:30:00',
                  '2 days 20:00:00',   '2 days 20:30:00',   '2 days 21:00:00',
                  '2 days 21:30:00',   '2 days 22:00:00',   '2 days 22:30:00',
                  '2 days 23:00:00'],
               dtype='timedelta64[ns]', length=145, freq=None)

The timedelta should be 0 in the first element and increasing from there. Is this a bug or am I doing something wrong? One of the two objects doesn't seem to be aware of summer time in CET, and one does. What is the general way to get the correct timedeltas.

More importantly, once and for all, if I have a pandas time series spanning many days, timezone-aware or not, and I want to:

  1. Plot the values in the time series by the time of day they appear in.

  2. Get them in any form that allows me to actually perform the same calculations I can do with datetime objects; i.e adding, subtracting, or comparing times like a normal person.

what is the recommended way forward (avoiding issues like the above)?


Update in light of comments: it's been mentioned that pytz doesn't recommend timezones other than UTC, but then what is the point? Regardless, I have timezone-aware data, and I need a way to do the above. Converting to UTC isn't useful now:

a.tz_convert("UTC")

gives

DatetimeIndex(['2021-05-31 22:00:00+00:00', '2021-05-31 22:30:00+00:00',
               '2021-05-31 23:00:00+00:00', '2021-05-31 23:30:00+00:00',
               '2021-06-01 00:00:00+00:00', '2021-06-01 00:30:00+00:00',
               '2021-06-01 01:00:00+00:00', '2021-06-01 01:30:00+00:00',
               '2021-06-01 02:00:00+00:00', '2021-06-01 02:30:00+00:00',
               ...
               '2021-06-03 17:30:00+00:00', '2021-06-03 18:00:00+00:00',
               '2021-06-03 18:30:00+00:00', '2021-06-03 19:00:00+00:00',
               '2021-06-03 19:30:00+00:00', '2021-06-03 20:00:00+00:00',
               '2021-06-03 20:30:00+00:00', '2021-06-03 21:00:00+00:00',
               '2021-06-03 21:30:00+00:00', '2021-06-03 22:00:00+00:00'],
              dtype='datetime64[ns, UTC]', length=145, freq=None)

The issue is I need to plot the clock-time in CET, not the clock-time in UTC+0. Basically I need to place the first element at 0 on a plot, and the rest where a human would logically put them based on a clock time.

Ideally, I could do a.time and do arithmetic and plotting with that, but someone decided it was not to be so I need a way around.

Marses
  • 1,464
  • 3
  • 23
  • 40
  • `(a - t0) / pd.Timedelta(minutes=1)` or `(a[0] - a[1]) / datetime.timedelta(minutes=1)` – Trenton McKinney Jun 02 '21 at 16:07
  • 1
    As per [`pytz`](https://pypi.org/project/pytz/), _The preferred way of dealing with times is to always work in UTC, converting to localtime only when generating output to be read by humans._ Also, _This library also allows you to do date arithmetic using local times, although it is more complicated than working in UTC as you need to use the `normalize()` method to handle daylight saving time and other timezone transitions. In this example, `loc_dt` is set to the instant when daylight saving time ends in the US/Eastern timezone._ – Trenton McKinney Jun 02 '21 at 16:16
  • @TrentonMcKinney the issue is the values are offset by an hour (i.e. look at the first element, it's negative), when they have the same timezones and times. Regarding working with UTC, yeah, that's true. So the work around it to switch to UTC and then back. However, this results in another glob of utterly ugly workaround code, for no reason other than that someone forbidding the addition of hours and minutes together. Also, it's actually not clear in the example above (see the addition to my question). – Marses Jun 02 '21 at 16:20
  • Why do you care so much about the date in the first place? You can ignore it by setting a custom formatter that just shows time. Also, CET ist not a time zone. It's a UTC offset. "Europe/Berlin" for example is a time zone - that happens to have CET during certain periods of time. – FObersteiner Jun 02 '21 at 17:07
  • @MrFuppes I need to plot the value by clock time. If I set a custom formatter, I just end up with the values plotted at those dates but only should the clock time. I need the data at different dates to "overlap". Secondly, I'm getting conflicting results on whether CET is just UTC+1 or whether it alternates between UTC+1 and UTC+2 with daylight savings. However: **Pandas seems to treat it as a timezone with alternating UTC offset** and using Europe/Berlin produces the same results as in my example. – Marses Jun 03 '21 at 07:37
  • So you have a time series and want to make an overlay plot for all days covered in the series, i.e. multiple values per time? My comment on the time zone is a fact, not related to pandas. pandas/pytz accepts some of those abbreviations for convenience. The current pandas version will issue warnings though. And CET *does not* alternate between UTC+1 and UTC+2 ;-) UTC+2 would be CEST. – FObersteiner Jun 03 '21 at 08:45

0 Answers0