What to do when daylight savings results in duplicate data rows?

Question

I have a fact table for energy consumption as follows:

f_meter_data: 

utc_calendar_id
local_calendar_id
meter_id
reading
timestamp

The calendar table is structured as per the Kimball recommendations, and it's the recommendations in the Data Warehouse Toolkit that are why I have the two calendar IDs so users can query on local and UTC time.

This is all well and good but the problems arise when daylight savings kicks in.

As the granularity is half hour periods, there will be a duplicate fact records when the clocks change.

And when the clocks change in the other direction there will be a gap in the data.

How can I handle this situation?

Should I average the duplicate values and store that instead?

And for when it's a gap in the data, should I use an average of the point immediately before and the point immediately after the gap?

If you are storing UTC, how can there be a gap? UTC does not do daylight saving. — Wichert Akkerman, Jun 16 '14 at 12:34
I'm also storing a local time reference. The reference points at a half hour slot in the day. — bcmcfc, Jun 16 '14 at 12:36

score 2 · Accepted Answer · edited May 23 '17 at 12:05

I have a feeling this question may end up getting closed as "primarily opinion based", but my particular opinion is that the system should be set up to deal with the fact that not every day has exactly 24 hours. There may be 23, 24 or 25. (Or, if you're on Lord Howe Island, 23.5, 24 or 24.5).

Depending on when your extra hour falls (which will be different for each time zone), you may have something like:

00 01a 01b 02 03 04 05 06 07 08 09 10 11 12 13 14 15 16 17 18 19 20 21 22 23

Or you might consider coupling the hour with the local UTC offset, like:

00-04:00 01-04:00 01-05:00 02-05:00 03-05:00  etc...

Or if you're doing half-hour buckets:

00:00-04:00  00:30-04:00  01:00-04:00  01:30-04:00  01:00-05:00  01:30-05:00 ...

It probably wouldn't be appropriate to do any averaging to align to 24 hours. If you did, then totals would be off.

You also should consider how people will be using the data. Will they be trying to figure out trends across a given hour of the day? If so, then how will they compensate for a spike or dip caused by the DST transition? It may be as simple as putting an asterisk and footnote on the output report. Or it may be much more involved than that, depending on the usage.

Also, you said you're working with 30-minute intervals. Be aware that there are some time zones that are 45-minute offset (Nepal, Chatham Islands, and a small region in Australia). So if you're trying to cover the entire world then you would need 15-minute interval buckets.

And, as Whichert pointed out in comments, if you're using UTC then there is no daylight saving time. It's only when you group by local-time that you'll have this concern.

You may also find the graphs in the DST tag wiki useful.

Sunil K-Standard Chartered · Answer 2 · 2015-06-05T15:31:00.897

I think you should simplify this with your business. Meaning when the clock is turned back, you turn back your record by pushing the old records out into a warning or error table and putting the new ones for the same interval.

As suggested by Matt, anyways reports would not tell the true story, if run by local time. Then, why give wrong data in the reports.

Or to followup on Matt's advice again change your interval records. You should then not bind the time interval to the local_id. Instead use a Interval_seq_id that runs in interval of 30 minutes that might have 48 records (1-48), 50 records (1-50) or 52 (1-52) records for a given day depending on your region. This technically will remove your duplicate problems on the Local_Int_starttime and Time_interval_Endtime, its no more dependant or bond with the time intervals.

This though moves the issue to your reports/query tools to solve how they now want to display time in the graphs that have duplicates on local time.Especially, if you want to do some analytics based on local time and meter reading. Though, this way the database design now differentiates the records through Interval_Seq_id and not using the time interval.

score 0 · Answer 3 · edited May 23 '17 at 10:26

0

There is a similar thread about daylight savings problems in C# here.

The answer goes into deep details about daylight savings. I believe the problem is somewhat similar.

edited May 23 '17 at 10:26

Community

1
1

answered Jun 16 '14 at 12:36

Tamer Tas

3,288
13
22

What to do when daylight savings results in duplicate data rows?

3 Answers3