10

I'm trying to replace my use of SAS with python + sqlite3; I'm trying to move my data from SAS datasets to SQLite databases. I have many time fields that are properly represented in python as datetime.time objects. Since SQLite is "lightly typed", I'm looking for advice about what format to use to store times in columns. (I know I have to write python adapters etc. to read and write the objects to and from the column.) These are the features I need to consider:

  • SQLite's ability to deal with the column in queries. (Eg will I be able to select out the rows that occur between two times?)
  • Size of the field. (My tables are often hundreds of millions of rows.)
  • Human readability. (I'm considering storing the time as an integer: microseconds since midnight. But this makes eyeballing the data harder.)

Has anyone solved this problem to their satisfaction?

jdmarino
  • 545
  • 5
  • 13

2 Answers2

9

There is a general recipe for storing any serializable Python object in an sqlite table.

Here is what the code look might look like for datetime.time objects:

import sqlite3
import datetime as DT

def adapt_timeobj(timeobj):
    return ((3600*timeobj.hour + 60*timeobj.minute + timeobj.second)*10**6 
            + timeobj.microsecond)

def convert_timeobj(val):
    val = int(val)
    hour, val = divmod(val, 3600*10**6)
    minute, val = divmod(val, 60*10**6)
    second, val = divmod(val, 10**6)
    microsecond = int(val)
    return DT.time(hour, minute, second, microsecond)


# Converts DT.time to TEXT when inserting
sqlite3.register_adapter(DT.time, adapt_timeobj)

# Converts TEXT to DT.time when selecting
sqlite3.register_converter("timeobj", convert_timeobj)

con = sqlite3.connect(":memory:", detect_types=sqlite3.PARSE_DECLTYPES)
cur = con.cursor()

# declare timecol to be of type timeobj
cur.execute("create table test (timecol timeobj)")

cur.executemany("insert into test (timecol) values (?)", 
                [(DT.time(1,2,3,4), ), (DT.time(5,6,7,8),) ])

You can use inequalities in the SQL, but note that the values being compared are those returned by adapt_timeobj, not the datetime.time objects. Fortunately, if the adapt_timeobj function returns integers that are orderable in the same order as the corresponding datetime.time objects (as they do above), then inequalities in the SQL will work as desired.

cur.execute("select timecol from test where timecol < ?",
            [DT.time(4,5,6)])
print(cur.fetchall())
# [(datetime.time(1, 2, 3, 4),)]

cur.execute("select timecol from test where timecol < ?",
            [DT.time(8,0,0)])
print(cur.fetchall())
# [(datetime.time(1, 2, 3, 4),), (datetime.time(5, 6, 7, 8),)]

con.commit()
cur.close()
con.close()

Note: If you look in the edit history, you'll see a simpler alternative for adapt_timeobj and convert_timeobj that stores the data as a str instead of as a int. It is simpler, but storing the data as a int is faster and more memory efficient.

Community
  • 1
  • 1
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • It is worth mentioning that passing `PARSE_DECLTYPES` is not optional in this case. – jfs Dec 24 '14 at 20:53
  • @J.F.Sebastian: It's also possible to use [PARSE_COLNAMES](https://docs.python.org/2/library/sqlite3.html#sqlite3.PARSE_COLNAMES), but then you would need to use SQL like `SELECT timecol [timeobj] ...` – unutbu Dec 24 '14 at 21:08
  • I've used microseconds since midnight in other situations (not python + sqlite). I don't love it, but it's sensible. I wasn't aware of the ability to use python objects in queries as you did in your example. I'll probably start with your suggestions. Thanks. – jdmarino Dec 26 '14 at 23:06
  • @jdmarino: I made a mistake in my first post. The SQL query is not making use of python objects. The comparison is done on the underlying database values -- the kind of values returned by `adapt_timeobj`. – unutbu Dec 27 '14 at 00:39
7

I really like the answer by @unutbu but here's a simple way to store a timestamp.

RFC 3339 is a very unambiguous timestamp format, easy for computers to parse and easy for humans to read. You could store timestamps as strings.

One nice property of RFC 3339: a simple ASCII sort also sorts chronologically.

But you don't really need the spec because it is so simple. Here's an example:

2014-12-24T23:59:59.9999-08:00

That is the last fraction of a second before Christmas day in my time zone, which is 8 hours behind UTC (thus the -08:00 part). Year, month, date, the string T, hour, minute, seconds, optional fractional second, timezone.

The timezone may also be Z which indicates UTC time. But it's probably more convenient to store the times in the local time zone so you can read them more easily.

Community
  • 1
  • 1
steveha
  • 74,789
  • 21
  • 92
  • 117
  • rfc3339 *requires* the date part. OP asks only about `datetime.time()` -- no date. Also, `-08` is incorrect. It should be `-08:00` – jfs Dec 24 '14 at 20:55
  • Well, the question didn't *forbid* storing the date, and I think RFC 3339 has some advantages. You are correct about the timezone and I fixed it; thank you. – steveha Dec 24 '14 at 20:59
  • Thanks for your answer. My datasets won't always have dates, just times. I have considered your solution but I'd need to add a "standard date" as a placeholder (e.g. 1/1/1960). The benefit of this approach is the sorting (as you pointed out), plus I get to use sqlite's date handling functions. The downside is that I've represented a date when I don't actually have one. – jdmarino Dec 26 '14 at 23:03
  • 1
    I wouldn't go so far as to add a fake date when all you need is the time. But you might want to standardize on using just the time part the way it is done in RFC 3339: `HH:MM:SS.fraction` where HH is a 2-digit hour from 00 to 23, MM and SS are always two digits (00 to 59), and `fraction` is optional but if present is the float value representing fraction of a second. Easy to read, and ASCII sort is a sort by time. – steveha Dec 27 '14 at 00:47