5

Assume someone in my database was born at -73440000 unix time. This means he's born on 04-Sep-1967 UTC but 03-Sep-1967 EDT. How would I count the number of people born on each day of the year in EDT?


Right off the bat, you will discover that

SELECT FROM_UNIXTIME(-73440000)
-- returns NULL

Returns NULL. MySQL can't handle negative unix timestamps.

Fine, we can work around that:

select date_add('1970-01-01', interval -73440000 second)
-- returns 1967-09-04 00:00:00

Yields 1967-09-04 00:00:00 which is his birthdate in UTC.

We can try to convert this to EDT (Toronto time):

select convert_tz('1967-09-04 00:00:00','UTC','America/Toronto')
-- returns 1967-09-04 00:00:00, but should be 1967-09-03 20:00:00

But as it turns out, CONVERT_TZ doesn't work on dates prior to 1970 either.

(I've installed the timezones already and it does work on dates between 1970 and 2038)

So now I'm stuck. I need to convert the unix timestamp to a MySQL DATE so that I can GROUP BY it and then COUNT it. The only other option I can think of is to return every record in the database as unix timestamps and use another language to do the conversion and tally them up, but that could get a bit ridiculous if there's millions of records.

N.B. You can't calculate the hour offset between EDT and UTC either because that can change throughout the year (daylight savings).

Community
  • 1
  • 1
mpen
  • 272,448
  • 266
  • 850
  • 1,236
  • Could you check this please http://stackoverflow.com/questions/14454304/convert-tz-returns-null ? it may be the same problem. – FZE Jan 21 '16 at 21:07
  • 1
    @FeyyazEsatoğlu Yeah, I already came across that question some months ago. I've installed the timezones into MySQL already -- it no longer returns NULL. However, for dates prior to 1970 it returns the date as-is, no conversion. I should have posted that. – mpen Jan 21 '16 at 21:10
  • OK, how about faking data ? As we know timestamp can go up to 22xx just add 100 years from now, and fool `CONVERT_TZ` and you may add a column when fetching data `fooled = 1|0` then wrapping by a select and subtract to find correct date. Isn't that possible ? Just an idea. – FZE Jan 21 '16 at 21:18
  • @FeyyazEsatoğlu I actually tried exactly that already, but "100 years" doesn't work. CONVERT_TZ only works up to 2038. I can compute the number of years to bring us up to 1970 and that gets us 99.9% of the way there, but DST (and maybe timezones?) changes throughout the years means there might still be some edge cases we've missed. – mpen Jan 21 '16 at 22:35
  • 1
    If you can't convert the birthdays into EDT dates, couldn't you instead convert range ends to unix timestamps? You would have to do this with something outside MySQL. For short spans (like couple of days) you could get away with a couple of `union`s, but for more extended periods you would have to create a table: `day date, dayStart int signed, dayEnd int signed, -- possibly also timezone varchar(?)`. Quite ugly, but could get the job done. – weirdan Jan 22 '16 at 01:06
  • @weirdan Uhh... I'd have to create a couple hundred year's worth. I was really hoping for something more generic and less hacky. What I'm requesting really can't be that uncommon? – mpen Jan 22 '16 at 01:19
  • I know Postgre supports wider range of timestamp values (8 bytes). However are you sure you're not striving for accuracy you can't actually achieve? Is timezone data all that accurate for dates before 20th century? And is your data that accurate? We're talking hours here, and, for example, at least one my grandgrandparents didn't know the day of month he was born on. – weirdan Jan 22 '16 at 03:23
  • @weirdan Not *before* the 20th century, but pretty close to :) The data actually isn't accurate to the hour or the minute, but that's how it's stored and I have to be sure to round in the correct direction regardless. I asked about DOBs but this is actually a problem throughout the system, with different and more accurate dates too. – mpen Jan 22 '16 at 04:08
  • First convert everything to UTC datetime. Then make a calendar table containing UTC start and end points, and EDT offsets, covering the whole period of interest. Then calculate the EDT dates for each UTC datetime. I suspect the hard part is getting the table of TZ offsets. – Ben Jan 22 '16 at 11:48

2 Answers2

3

You could do the conversion manually; if you've got the zone data loaded into MySQL already then you've got the data already going back to 1918. Let's take a look at the list:

SET @timezone = "America/Toronto";
SELECT FROM_UNIXTIME(0) + INTERVAL Transition_Time SECOND AS change_time, Offset, Abbreviation
FROM mysql.time_zone_transition t
LEFT JOIN mysql.time_zone_transition_type tt ON (
    t.Time_zone_id = tt.Time_zone_id AND
    t.Transition_type_id = tt.Transition_type_id
)
LEFT JOIN mysql.time_zone_name n ON (
    t.Time_zone_id = n.Time_zone_id
)
WHERE n.Name = @timezone
    AND Transition_time < 0
ORDER BY change_time ASC;

Modifying for your sample date, we can extract this:

SET @timezone = "America/Toronto";
SET @birthday = -73440000;

SELECT ('1970-01-01 00:00:00' + INTERVAL @birthday SECOND) + INTERVAL Offset SECOND AS offsetDate
FROM mysql.time_zone_transition t
LEFT JOIN mysql.time_zone_transition_type tt ON (
    t.Time_zone_id = tt.Time_zone_id AND
    t.Transition_type_id = tt.Transition_type_id
)
LEFT JOIN mysql.time_zone_name n ON (
    t.Time_zone_id = n.Time_zone_id
)
WHERE n.Name = @timezone
    AND Transition_time < @birthday
ORDER BY Transition_time DESC
LIMIT 1;

Since it picks the most recent transition before the date, it will return no records for dates that precede the first transition. This should serve as an indicator that you're in terra incognita and will have to make up your own offset.

If you'd like to use the first transition time as a fallback, you can remove Transition_time < @birthday from the WHERE clause and add it to the ORDER BY clause instead, as you suggested in the comments.

miken32
  • 42,008
  • 16
  • 111
  • 154
  • I thought this would be really hard to do in pure SQL but that doesn't look too bad. I'll try this out tomorrow. Thank you! – mpen Jan 26 '16 at 05:31
  • This looks like it works for all the dates I've tried, but I had to make a couple adjustments. First, put `Transition_time < @birthday DESC` into the ORDER BY clause and remove it from the WHERE. This will make it return a result for dates prior to 1902 instead of NULL. Second, if you want this to work with `@@session.time_zone` instead of one you pass in then you have to make sure to set that manually as soon as you connect, otherwise it will contain something stupid like "SYSTEM" which is not at all useful. – mpen Jan 26 '16 at 18:50
  • Glad I could help. This should return 0 results for any date before the zone data starts, indicating that there is no known offset and that you're going to have to make something up yourself. I'll edit the answer to make that explicit. – miken32 Jan 26 '16 at 19:33
0

I've tested these functions with over 200 dates between 1900 1918 (prior to that timezones are messed!) and 2100, and compared them against dates w/ timezones generated by PHP. They seem to work perfectly.

Turn a timestamp into a datetime:

CREATE FUNCTION `ts2dt` (`d` bigint) RETURNS datetime
DETERMINISTIC
  BEGIN
        DECLARE `result` DATETIME;


        SELECT ('1970-01-01 00:00:00' + INTERVAL `d` SECOND) + INTERVAL Offset SECOND INTO `result`
        FROM mysql.time_zone_transition t
        LEFT JOIN mysql.time_zone_transition_type tt ON (
            t.Time_zone_id = tt.Time_zone_id AND
            t.Transition_type_id = tt.Transition_type_id
        )
        LEFT JOIN mysql.time_zone_name n ON (
            t.Time_zone_id = n.Time_zone_id
        )
        WHERE n.Name = @@session.time_zone
        ORDER BY Transition_time <= `d` DESC, Transition_time DESC
        LIMIT 1;

        RETURN `result`;
    END

Turn a datetime into a timestamp:

CREATE FUNCTION `dt2ts` (`d` DATETIME) RETURNS BIGINT
DETERMINISTIC
BEGIN
        DECLARE `result` BIGINT;

        SELECT TIMESTAMPDIFF(SECOND, '1970-01-01 00:00:00', `d`) - Offset INTO `result`
        FROM mysql.time_zone_transition t
            LEFT JOIN mysql.time_zone_transition_type tt
                ON t.Time_zone_id = tt.Time_zone_id AND t.Transition_type_id = tt.Transition_type_id
            LEFT JOIN mysql.time_zone_name n ON t.Time_zone_id = n.Time_zone_id
        WHERE n.Name = @@session.time_zone
        ORDER BY Transition_time <= TIMESTAMPDIFF(SECOND, '1970-01-01 00:00:00', `d`) - Offset DESC, Transition_time DESC
        LIMIT 1;


        RETURN `result`;
END

The only difference I found between this and PHP is that dt2ts('2016-11-06 01:00:00') gives 1478422800 for America/Vancouver whereas PHP gives 1478419200, however, both timestamps correspond to 1:00am (because of the DST switch) so I guess either answer is correct.

N.B. These functions give null if your session timezone isn't set. Set it with:

set session time_zone='America/Vancouver';
mpen
  • 272,448
  • 266
  • 850
  • 1,236