bson.json_util datetime encode and decode best practice

Question

I'm trying to encode & decode python datetime object using pymongo's bson utils. What's best practice here?

>>> from bson import json_util
>>> import datetime
>>> utcnow = datetime.datetime.utcnow()
>>> x = json_util.dumps({'now': utcnow})
>>> json_util.loads(x)['now'] == utcnow
False

>>> json_util.loads(x)['now'] - utcnow
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
TypeError: can't subtract offset-naive and offset-aware datetimes

>>> json_util.loads(x)['now'].replace(tzinfo=None) - utcnow
datetime.timedelta(-1, 86399, 999088)

>>> datetime.datetime.utcfromtimestamp(1424297808578 / 1000) == json_util.loads(x)['now'].replace(tzinfo=None)
True

^ Is this really the best way? Or write your own encode/decode and use the json lib?

score 1 · Answer 1 · edited May 23 '17 at 11:44

It seems bson rounds to milliseconds:

>>> from datetime import datetime
>>> import bson # $ pip install bson
>>> d = datetime.utcnow(); d, abs(d - bson.loads(bson.dumps({'utcnow': d}))['utcnow'].replace(tzinfo=None))

(datetime.datetime(2015, 2, 18, 23, 54, 47, 733092), datetime.timedelta(0, 0, 92))

It is a documented behavior:

UTC datetime - The int64 is UTC milliseconds since the Unix epoch.

If you need microseconds: you could store an integer number of microseconds since the Unix epoch instead:

from datetime import datetime

td = utc_dt - datetime(1970, 1, 1) 
micros = td.microseconds + (td.seconds + td.days * 24 * 3600) * 10**6

To convert microseconds since the Unix epoch back into a naive datetime object that represents UTC time:

from datetime import datetime, timedelta

utc_dt = datetime(1970, 1, 1) + timedelta(microseconds=micros)

int64 ("\x12") is more than enough to represent the Unix time with the microsecond resolution (it exceeds datetime range anyway).

Note: POSIX timestamp "forgets" leap seconds e.g.:

import time

tt = time.strptime("2015-07-01 01:59:60", "%Y-%m-%d %H:%M:%S")
ts_leap = time.mktime(tt)
tt = time.strptime("2015-07-01 02:00:00", "%Y-%m-%d %H:%M:%S")
ts_after = time.mktime(tt)
assert ts_leap == ts_after # assuming "right" timezone is not used

If you care about microseconds; you should find out what your system does around leap seconds.

Time (hardware clocks, software timers) on an ordinary computer are not very accurate therefore the millisecond resolution should be enough in many cases e.g., if you use ntp to synchronize time between machine then NTP v3 is accurate to 1-2ms in a LAN and 10s of ms in WAN nets.

Though, sometimes, you want to preserve digits in the input even if they are not accurate.

I don't necessarily "need" anything, I'd like to be able to encode; decode; compare == True; — estobbart, Feb 19 '15 at 02:46
@estobbart: you could compare with the millisecond resolution `abs(a-b) <= timedelta(microseconds=500)` or store microseconds as shown in the answer. It is up to you. Both are valid choices in different applications. — jfs, Feb 19 '15 at 02:55
I don't mind the downvote but would appreciate an explanation — jfs, May 20 '15 at 18:37

bson.json_util datetime encode and decode best practice

1 Answers1