5

We have a table that is filled with data from a legacy report of another system. The columns of that table reflect the same structure of the report.

Here are a abbreviated structure of the table:

CREATE TABLE IF NOT EXISTS LEGACY_TABLE (
  REPORT_DATE DATE NOT NULL,
  EVENT_ID BIGINT PRIMARY KEY NOT NULL,
  START_HOUR TIMESTAMP WITHOUT TIME ZONE,
  END_HOUR TIME WITHOUT TIME ZONE,
  EXPECTED_HOUR TIME WITHOUT TIME ZONE
);

We are refactoring this table to deal with different time zones of different clients. The new structure would be something like:

CREATE TABLE IF NOT EXISTS LEGACY_TABLE (
  REPORT_DATE DATE NOT NULL,
  EVENT_ID BIGINT PRIMARY KEY NOT NULL,
  START_HOUR TIMESTAMP WITH TIME ZONE,
  END_HOUR TIME WITH TIME ZONE,
  EXPECTED_HOUR TIME WITH TIME ZONE
);

These hour fields represents a specific point in time during the day represented by the REPORT_DATE column. What I mean by that is that every TIME column represents a moment during the day specified in REPORT_DATE.

Some other points to consider:

  • We don't know why the START_HOUR is in TIMESTAMP format in the report we receive from the legacy system. But we import the data the way it comes to us.
  • The fields in the report are formatted according to the timezone of the client, so to refactor this table we need to combine the timezone of the client (we have this info) to properly insert the timestamps/times in UTC.

But now to the problem. The value of these columns are used to compute another values multiple times in our system, something like the following:

START_HOUR - END_HOUR (the result of this operation is currently being casted to TIME WITHOUT TIME ZONE)
START_HOUR < END_HOUR
START_HOUR + EXPECTED_HOUR
EXPECTED_HOUR - END_HOUR
EXPECTED_HOUR < '05:00' 

After some research I found that is not recommended to use the type TIME WITH TIME ZONE (Postgres time with time zone equality) and now I'm a bit confused about what is the best way to refactor this table to deal with different time zones and handle the different column operations that we need to.

Besides that, I already know that is safe to subtract two columns of type TIMESTAMP WITH TIME ZONE. This subtraction operation is taking into account DST changes (Subtracting two columns of type timestamp with time zone) but how about the others? And the one subtracting a TIME from a TIMESTAMP?.

And about the table refactoring, should we use TIME WITH TIME ZONE anyways? Should we continue using TIME WITHOUT TIME ZONE? Or is better to forget the type TIME altogether and combine the DATE with the TIME and change the columns to TIMESTAMP WITH TIME ZONE?

I think these questions are related because the new column types we choose to use, will define how we operate with the columns.

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
Luiz
  • 325
  • 7
  • 28
  • I personally favor using an integer column that holds a unix timestamp, although in 2038 you'll need to change that column type – BShaps May 10 '18 at 17:01
  • IIUC, you ignore the date part of `START_HOUR`, currently? Or you have a `CHECK` constraint enforcing that it's the same as `REPORT_DATE`? – Erwin Brandstetter May 10 '18 at 22:18
  • @ErwinBrandstetter: Currently we save the date part but for all current calculations we only use the time part. When we process the legacy report we ensure that the date part and `REPORT_DATE` are the same. – Luiz May 10 '18 at 22:40
  • `we need to combine the timezone of the client (we have this info)`. How exactly do you have this info without saving it? – Erwin Brandstetter May 10 '18 at 23:58
  • We save it. In the clients table. We know what client is importing the report and so also know the timezone. – Luiz May 11 '18 at 02:02

1 Answers1

5

You asserted that:

every TIME column represents a moment during the day specified in REPORT_DATE.

So you never cross the a dateline within the same row. I suggest to save 1x date 3x time and the time zone (as text or FK column):

CREATE TABLE legacy_table (
   event_id      bigint PRIMARY KEY NOT NULL
 , report_date   date NOT NULL
 , start_hour    time
 , end_hour      time
 , expected_hour time
 , tz            text  -- time zone
);

Like you already found, timetz (time with time zone) should generally be avoided. It cannot deal with DST rules properly (daylight saving time).

So basically what you already had. Just drop the date component from start_hour, that's dead freight. Cast timestamp to time to cut off the date. Like: (timestamp '2018-03-25 1:00:00')::time

tz can be any string accepted by the AT TIME ZONE construct, but to deal with different time zones reliably, it's best to use time zone names exclusively. Any name you find in the system catalog pg_timezone_names.

To optimize storage, you could collect allowed time zone names in a small lookup table and replace tz text with tz_id int REFERENCES my_tz_table.

Two example rows with and without DST:

INSERT INTO legacy_table VALUES
   (1, '2018-03-25', '1:00', '3:00', '2:00', 'Europe/Vienna')  -- sadly, with DST
 , (2, '2018-03-25', '1:00', '3:00', '2:00', 'Europe/Moscow'); -- Russians got rid of DST

For representation purposes or calculations you can do things like:

SELECT (report_date + start_hour)    AT TIME ZONE tz AT TIME ZONE 'UTC' AS start_utc
     , (report_date + end_hour)      AT TIME ZONE tz AT TIME ZONE 'UTC' AS end_utc
     , (report_date + expected_hour) AT TIME ZONE tz AT TIME ZONE 'UTC' AS expected_utc
     -- START_HOUR - END_HOUR
     , (report_date + start_hour) AT TIME ZONE tz
     - (report_date + end_hour)   AT TIME ZONE tz AS start_minus_end
FROM   legacy_table;

You might create one or more views to readily display strings as needed. The table is for storing the information you need.

Note the parentheses! Else the operator + would bind before AT TIME ZONE due to operator precedence.

And behold the results:

db<>fiddle here

Since the time is manipulated in Vienna (like any place where silly DST rules apply), you get "surprising" results.

Related:

Erwin Brandstetter
  • 605,456
  • 145
  • 1,078
  • 1,228
  • Thanks for the complete answer! I have a feel questions, though: 1) Why adding `report_date + end_hour` and `report_date + expected_hour` in tz `'Europe/Vienna'` generates the same result? Considering that start and expected are different as in your _db<>fiddle_. 2) Why apply `AT TIME ZONE` again using `UTC`? When we first apply the tz using the client tz we don't have the timestamp in the clients timezone already? This is a bit confused for me since when doing `START_HOUR - END_HOUR` you don't apply the `AT TIME ZONE` twice. – Luiz May 11 '18 at 14:33
  • @Luiz: 1) Because time was advanced by an hour at `2018-03-25 02:00` in the EU due to silly DST rules. Thus 3 a.m. is the same as 2 a.m. at this day. 2) Follow [my added link](https://stackoverflow.com/a/9576170/939860) for a detailed explanation. We don't need the second `AT TIME ZONE` for calculations, so I don't use it there. – Erwin Brandstetter May 12 '18 at 02:46
  • 1
    I don't believe I'm the first to upvote the answer! Thank you Erwin for you precious time and effort – Vao Tsun May 16 '18 at 08:28
  • @ErwinBrandstetter Just to check: if I need to subtract a `TIME` (since I'm using `TIME WITHOUT TIME ZONE` now, the `TIME` is already saved in the client's tz) from a `TIMESTAMPTZ` the correct approach is to first apply `AT TIME ZONE` (using the same `TZ` in what the `TIME` was saved) to the `TIMESTAMPTZ`, right (this assumes I do not have a DATE to bind to the TIME part)? From my tests this seems to be the right approach but I'm not fully convinced that this will cover all cases... – Luiz May 22 '18 at 12:51
  • 1
    It depends on what *"subtract a `TIME`"* is supposed to mean exactly. It the time is supposed to be aware of the time zone it's in, you must subtract from `timestamptz`, not `timestamp`. See this basic comparison: https://dbfiddle.uk/?rdbms=postgres_10&fiddle=458d2c483ea060a15773432cbfda6b49. – Erwin Brandstetter May 22 '18 at 21:55