2

I am looking to analyze traffic flow with relation to weather data. The traffic data has a UNIX timestamp (aka epoch), but I am running into trouble with converting the timestamp (in the weather data) to epoch. The problem is that I am in Norway and the UTC timestamp in the weather data isn't in the same timezone as me (GMT+1).

My initial approach

I first tried converting it into epoch and treating the data as if it was in the GMT+1 timezone. Then I compensated by subtracting the difference in number of seconds between UTC and GMT+1.

Problems with the approach

I realize first of all that this approach is very primitive and not very elegant (in fact probably it is at best an ugly hack). However, the biggest problem here is that the difference between UTC and GMT+1 is not constant (due to daylight savings).

Question

Is there any reliable way of turning UTC time to a UNIX time stamp in python (taking into account that my machine is in GMT+1)? The timestamp is in the following format:

Y-m-d HH:MM:SS

Edit: Tried rmunns' solution:

def convert_UTC_to_epoch(timestamp):
  tz_UTC = pytz.timezone('UTC')
  time_format = "%Y-%m-%d %H:%M:%S"
  naive_timestamp = datetime.datetime.strptime(timestamp, time_format)
  aware_timestamp = tz_UTC.localize(naive_timestamp)
  epoch = aware_timestamp.strftime("%s")
  return (int) (epoch)

This does not work properly as evidenced below:

#Current time at time of the edit is 15:55:00 UTC on June 9th 2014.
>>> diff = time.time() - convert_UTC_to_epoch("2014-06-09 15:55:00")
>>> diff
3663.25887799263
>>> #This is about an hour off.
Arnab Datta
  • 5,356
  • 10
  • 41
  • 67
  • I don't understand your first paragraph. You say the traffic data is in "UNIX timestamp (aka epoch)" format, but you're trying to convert it to epoch, the format it's already in? Please clarify. I'll give an answer to your question using my best guess about what you mean, but a little more explanation of the input format and output format would be helpful. – rmunn Jun 04 '14 at 02:20
  • @rmunn It was a small typo. The traffic data is in epoch format. The weather data is in a string format Y-m-d HH:MM:SS (UTC). – Arnab Datta Jun 05 '14 at 17:40
  • I've updated my answer to demonstrate why you had the problem you had, and the correct solution. (There are two correct solutions, depending on whether you need to care about milliseconds or not.) – rmunn Jun 10 '14 at 02:52
  • 1
    UPDATE: Sorry, my previous comments were wrong. The problem is not your use of `time.time()`, the problem is your use of `strftime("%s")`. [**NEVER** use `strftime("%s")`](http://stackoverflow.com/questions/11743019/convert-python-datetime-to-epoch-with-strftime#comment25328801_11743262): "it is not supported, it is not portable, it may silently produce a wrong result for an aware datetime object, it fails if input is in UTC (as in the question) but local timezone is not UTC". – rmunn Jun 10 '14 at 04:21
  • related: [Converting datetime.date to UTC timestamp in Python](http://stackoverflow.com/q/8777753/4279) – jfs Aug 31 '14 at 12:32

4 Answers4

6

The solution was to use the calendar module (inspired from here)

>>>#Quick and dirty demo
>>>print calendar.timegm(datetime.datetime.utcnow().utctimetuple()) - time.time()
>>>-0.6182510852813721

And here is the conversion function:

import calendar, datetime, time

#Timestamp is a datetime object in UTC time
def UTC_time_to_epoch(timestamp):
  epoch = calendar.timegm(timestamp.utctimetuple())
  return epoch
Arnab Datta
  • 5,356
  • 10
  • 41
  • 67
  • This will work just fine if your timestamps are in UTC, but that leaves me puzzled. Why did you make such a big deal in your question about the one-hour difference between your timezone and UTC? If your timestamps are in UTC, there was no need for any time zone conversions in the first place. – rmunn Jun 10 '14 at 04:42
  • I guess what I'm asking is: what part of your incoming data is in local time? If none of it is in local time, then you can ignore my whole answer about `pytz` as you don't need to make any timezone conversions. – rmunn Jun 10 '14 at 04:43
  • As I said before, one part (the weather data) is in UTC time. Another part has UNIX timestamps. – Arnab Datta Jun 10 '14 at 12:30
  • I didn't make a big deal in my question. If you notice, I asked this explicitly: `Is there any reliable way of turning UTC time to a UNIX time stamp in python (taking into account that my machine is in GMT+1)?` When I tried your approach, there was a 1 hour time difference thhere that I couldn't explain. It showed the difference as 3600~ seconds, when it should be 7200 instead. – Arnab Datta Jun 10 '14 at 12:35
  • "Another part has UNIX timestamps." Since UNIX timestamps are defined as "seconds since the epoch, in UTC", it leaves me wondering why you even mentioned timezones in the body of your question at all. You have UNIX timestamps in UTC, and weather data in UTC -- so it shouldn't matter at all what timezone your local time is. Keep everything in UTC, don't ever convert it, and you're done. I spent a lot of time on an answer to what I *thought* you were asking, when in fact all you needed was the `calendar.timegm()` function (which, thankfully, you found). – rmunn Jun 10 '14 at 12:43
  • What I meant when I said you "made a big deal" about the one-hour difference was comments like "The problem with this approach is that this will still not take into account the time difference between my timezone (GMT+1) and UTC. This is the crux of my question." in response to monkut, or the "(taking into account that my machine is in GMT+1)" parenthetical in your question. None of that was relevant, and it ended up sending me on a rabbit-trail. If you hadn't mentioned timezones, I probably would have been able to give you the right answer much sooner. At least you have it working now. :-) – rmunn Jun 10 '14 at 12:46
  • The reason was I didn't know about calendar.timegm. So when I worked with datetime, I realized that it cared about the fact that my machine was in GMT+1. – Arnab Datta Jun 10 '14 at 12:50
  • @rmunn: the local timezone might matter because if your local timezone has zero utc offset then you could use (incorrect otherwise) `time.mktime()` function. – jfs Aug 31 '14 at 12:17
  • note: `timegm()` strips fractions of a second i.e., `-0.61` is not the time difference (it should be much closer to zero in this case). `time.time()` returns a float value (with fractions of a second). – jfs Aug 31 '14 at 12:24
3

An alternative, datetime has it's own .strptime() method.

http://en.wikipedia.org/wiki/Unix_time

The Unix epoch is the time 00:00:00 UTC on 1 January 1970 (or 1970-01-01T00:00:00Z ISO 8601).

import datetime
unix_epoch = datetime.datetime(1970, 1, 1)
log_dt = datetime.datetime.strptime("14-05-07 12:14:16", "%y-%m-%d %H:%M:%S")
seconds_from_epoch = (log_dt - unix_epoch).total_seconds()
>>> 1399490056.0
monkut
  • 42,176
  • 24
  • 124
  • 155
  • The problem with this approach is that this will still not take into account the time difference between my timezone (GMT+1) and UTC. This is the crux of my question. And yes, correct me if I am wrong (i.e. this approach does take into account the timezone difference). – Arnab Datta Jun 05 '14 at 23:51
  • If you have to deal with daylight savings you should be using `pytz`, as rmunn mentions. – monkut Jun 06 '14 at 01:22
  • @ArnabDatta: if `log_dt` is in UTC (as your usage of `calendar.timegm()` implies) then it doesn't matter what your local timezone is: `posix_timestamp_seconds_since_epoch = (log_dt - unix_epoch).total_seconds()` gives correct result (within float precision) – jfs Aug 31 '14 at 12:27
3

The pytz module will probably help you. It allows you to write code like:

import pytz
import datetime
tz_oslo = pytz.timezone('Europe/Oslo')
time_format = "%Y-%m-%d %H:%M:%S"
naive_timestamp = datetime.datetime(2014, 6, 4, 12, 34, 56)
# Or:
naive_timestamp = datetime.datetime.strptime("2014-06-04 12:34:56", time_format)
aware_timestamp = tz_oslo.localize(naive_timestamp)
print(aware_timestamp.strftime(time_format + " %Z%z"))

This should print "2014-06-04 14:34:56 CEST+0200".

Do note the following from the pytz manual:

The preferred way of dealing with times is to always work in UTC, converting to localtime only when generating output to be read by humans.

So keep that in mind as you write your code: do the conversion to local time once and once only, and you'll have a much easier time doing, say, comparisons between two timestamps correctly.

Update: Here are a couple of videos you may find useful:

  • What you need to know about datetimes, a PyCon 2012 presentation by Taavi Burns (30 minutes)
  • Drive-in Double Header: Datetimes and Log Analysis, a two-part presentation. (Caution: annoying buzz in the video, but I couldn't find a copy with better sound). The first part is the "What you need to know about datetimes" presentation I linked just above, and the second part has some practical tips for parsing log files and doing useful things with them. (50 minutes)

Update 2: The convert_UTC_to_epoch() function you mention in your updated question (which I've reproduced below) is returning local time, not UTC:

def convert_UTC_to_epoch(timestamp):
  tz_UTC = pytz.timezone('UTC')
  time_format = "%Y-%m-%d %H:%M:%S"
  naive_timestamp = datetime.datetime.strptime(timestamp, time_format)
  aware_timestamp = tz_UTC.localize(naive_timestamp)
  epoch = aware_timestamp.strftime("%s")
  return (int) (epoch)

The problem is that you're using strftime("%s"), which is undocumented and is returning the wrong result. Python doesn't support the %s parameter, but it appears to work because it gets passed to your system's strftime() function, which does support the %s parameter -- but it returns local time! You're taking a UTC timestamp and parsing it as local time, which is why it's an hour off. (The mystery is why it isn't two hours off -- isn't Norway in daylight savings time right now? Shouldn't you be at UTC+2?)

As you can see from the interactive Python session below, I'm in the UTC+7 timezone and your convert_UTC_to_epoch() function is seven hours off for me.

# Current time is 02:42 UTC on June 10th 2014, 09:42 local time
>>> time.timezone
-25200
>>> time.time() - convert_UTC_to_epoch("2014-06-10 02:42:00")
25204.16531395912
>>> time.time() + time.timezone - convert_UTC_to_epoch("2014-06-10 02:42:00")
6.813306093215942

The strftime("%s") call is interpreting 02:42 on June 10th as being in local time, which would be 19:42 UTC on June 9th. Subtracting 19:42 UTC on June 9th from 02:42 UTC June 10th (which is what time.time() returns) gives a difference of seven hours. See Convert python datetime to epoch with strftime for more details on why you should never use strftime("%s").

(By the way, if you saw what I had previously written under the heading "Update 2", where I claimed that time.time() was returning local time, ignore that -- I got it wrong. I was fooled at first by the strftime("%s") bug just like you were.)

Community
  • 1
  • 1
rmunn
  • 34,942
  • 10
  • 74
  • 105
  • Note that my code is assuming that your input timestamps are strings. If your input is epoch numbers instead (seconds since midnight on January 1st, 1970 in UTC), then you'll want to use `datetime.datetime.fromtimestamp(some_number)` instead of `datetime.datetime.strptime(some_str, parse_fmt)`. – rmunn Jun 04 '14 at 02:36
  • In case it's not clear: the timezone definitions provided by the pytz module are aware of daylight-savings time rules for each individual country. So you don't have to reimplement a DST conversion rule yourself (which, as you say, would be an ugly hack); you can just use the conversion that pytz provides. – rmunn Jun 04 '14 at 02:42
  • I tried your method and unfortunately as you can see above, it's an hour off. – Arnab Datta Jun 09 '14 at 16:06
  • This is why it's important to always do all your calculations in UTC, and to be aware at all times of whether your incoming data is in UTC or in local time. In this case, you subtracted a UTC time from a local time, and of course it was an hour off. (What puzzles me, actually, is why it wasn't *two* hours off. Isn't Norway on daylight savings time right now, which would make your current time zone UTC+2?) – rmunn Jun 10 '14 at 02:31
  • Whoops -- my previous comments (which I've just deleted) were wrong. I claimed incorrectly that `time.time()` was returning local time, but it wasn't. The problem was your use of the undocumented, does-the-wrong-thing `strftime("%s")`. I've updated by answer to reflect this. – rmunn Jun 10 '14 at 04:38
  • Ok, I get that strftime is unreliable and not to be used. But the problem isn't that. My goal is to convert the UTC timestamps into epoch. The reason is: my traffic data is recorded every 5 minutes, the weather data is recorded every hour. I want to compare every traffic observation to the closest weather observation. The cleanest way to do that is not with UTC, but with UNIX timestamps on both, as it would only involve just one arithmetic operation. – Arnab Datta Jun 10 '14 at 12:45
  • Then `calendar.gmtime(timestamp.utctimetuple())` is exactly what you need, as you figured out, and you can ignore my whole answer dealing with the pytz module. – rmunn Jun 10 '14 at 12:51
  • And yes, it was puzzling to me why it wasn't two hours off. As you say, Norway is indeed in daylight savings time at the moment. – Arnab Datta Jun 10 '14 at 14:41
  • `strftime('%s')` expects local time (*as an input*); it returns POSIX timestamp as a string -- it is incorrect to say: *"it returns local time"* e.g., `datetime.utcfromtimestamp()` returns UTC time; `datetime.fromtimestamp()` returns local time for *the same* timestamp -- the timestamp itself has **no timezone**. – jfs Aug 31 '14 at 12:36
0

You can use the time and datetime modules:

import time, datetime
date = "14-05-07 12:14:16" #Change to whatever date you want
date = time.strptime(date, "%y-%m-%d %H:%M:%S")
epoch = datetime.datetime.fromtimestamp(time.mktime(date)).strftime('%s')

This runs as:

>>> import time, datetime
>>> date = "14-05-07 12:14:16"
>>> date = time.strptime(date, "%y-%m-%d %H:%M:%S")
>>> epoch = datetime.datetime.fromtimestamp(time.mktime(date)).strftime('%s')
>>> epoch
'1399490056'
>>> 
A.J. Uppal
  • 19,117
  • 6
  • 45
  • 76
  • 1
    `time.mktime(date)` is enough. – falsetru Jun 04 '14 at 02:24
  • `mktime()` expects local time (as time tuple), not UTC time as in the question. `mktime()` already returns the timestamp (seconds since Epoch on Unix); you shouldn't call `fromtimestamp().strftime()`. `strftime('%s')` is not portable (it works with local time on Unix). – jfs Aug 31 '14 at 12:21