9

I have used SqlAlchemy to create a table, Record. Each record has a field, date, which stores a DateTime. I want to find all records whose date is more recent than eight hours ago.

I came up with four ways to write a filter, all involving simple arithmetic comparing the current time, the record's time, and an eight hour timedelta. The problem is: half of these filters return rows outside of the eight hour window.

from sqlalchemy import Column, Integer, DateTime
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy.orm import sessionmaker
from sqlalchemy import create_engine
import datetime

Base = declarative_base()

class Record(Base):
    __tablename__ = 'record'
    id = Column(Integer, primary_key=True)
    date = Column(DateTime, nullable=False)

engine = create_engine('sqlite:///records.db')
Base.metadata.create_all(engine)
DBSession = sessionmaker(bind=engine)
session = DBSession()

#if the db is empty, add some records to the database with datetimes corresponding to one year ago and one hour ago and yesterday
now = datetime.datetime(2018, 4, 4, 10, 0, 0)
if not session.query(Record).all():
    session.add(Record(date = now - datetime.timedelta(days=365)))
    session.add(Record(date = now - datetime.timedelta(days=1)))
    session.add(Record(date = now - datetime.timedelta(hours=1)))


delta = datetime.timedelta(hours=8)

#these are all equivalent to "records from the last eight hours"
criterion = [
    (now - Record.date < delta),
    (Record.date > now - delta),
    (delta > now - Record.date),
    (now - delta < Record.date),
]

for idx, crit in enumerate(criterion):
    query = session.query(Record).filter(crit)
    print("\n\nApproach #{}.".format(idx))
    print("Generated statement:")
    print(query.statement)
    records = query.all()
    print("{} row(s) retrieved.".format(len(records)))
    for record in query.all():
        print(record.id, record.date)

Result:

Approach #0.
Generated statement:
SELECT record.id, record.date
FROM record
WHERE :date_1 - record.date < :param_1
3 row(s) retrieved.
1 2017-04-04 10:00:00
2 2018-04-03 10:00:00
3 2018-04-04 09:00:00


Approach #1.
Generated statement:
SELECT record.id, record.date
FROM record
WHERE record.date > :date_1
1 row(s) retrieved.
3 2018-04-04 09:00:00


Approach #2.
Generated statement:
SELECT record.id, record.date
FROM record
WHERE :date_1 - record.date < :param_1
3 row(s) retrieved.
1 2017-04-04 10:00:00
2 2018-04-03 10:00:00
3 2018-04-04 09:00:00


Approach #3.
Generated statement:
SELECT record.id, record.date
FROM record
WHERE record.date > :date_1
1 row(s) retrieved.
3 2018-04-04 09:00:00

Approaches 1 and 3 are correct - they return the record from one hour ago, and not the record from one day ago or one year ago. Approaches 0 and 2 are incorrect because they return the record from one day ago and the record from one year ago in addition to the record from one hour ago.

What is causing this discrepancy? I notice that #1 and #3 generate statements that only parameterize a single datetime object, while #0 and #2 parameterize both a datetime object and a timedelta object. Are timedeltas parameterized in an unusual way that would make them unsuitable for arithmetic of this kind?

Kevin
  • 74,910
  • 12
  • 133
  • 166
  • Could it be order of operations? What happens if you add parentheses around your subtractions? – pault Apr 04 '18 at 15:34
  • @pault, interesting idea. Just tried it - parentheses don't change the outcome. – Kevin Apr 04 '18 at 15:37
  • It was worth a shot. This is odd. What is the result if you select `now - Record.date` and `now - delta`? See if those values are being computed as you expect. – pault Apr 04 '18 at 15:42
  • Let's see. `print(now - delta)` gives `2018-04-04 02:00:00` as expected, and `print(now - record.date)` gives `1 day, 0:00:00` when `record` is record #1, and `1:00:00` when `record` is record #2, as expected. Or did you mean I should try using those as criteria? `now - Record.date` gives me zero rows, and `now - delta` gives an `ArgumentError` because you can't pass a datetime object to `.filter`. – Kevin Apr 04 '18 at 15:51
  • Any problem with timezones? If you run the queries given when you turn `echo` on in sqlalchemy, do they give you the same results? – Wayne Werner Apr 04 '18 at 15:57
  • I don't _think_ timezones are a problem, since I'm explicitly creating all of the datetime objects via their regular constructor. In particular, I'm not calling `datetime.datetime.now()` to create the `now` object. So I don't see an obvious way for real-world data to sneak into my test data. If I turn `echo` on, I get the same results as before. – Kevin Apr 04 '18 at 16:04
  • I've added a third record to my db, with a date of one year ago. It incorrectly appears the same way that the "one day ago" record does. I think this eliminates time zones as a possibility, since they would only account for a discrepancy of up to 24 hours. – Kevin Apr 04 '18 at 16:10
  • 1
    [Turning on logging](https://stackoverflow.com/a/2950685/190597) (or using `echo=True`) shows in Approaches #0 and #2 that the bare `delta` `param_1` value is getting converted to the datetime `1970-01-01 08:00:00.000000`. The 1970 part is what throws off the rest of the comparison. Approaches #1 and #3 seem to work since the final comparison is between datetimes, not timedeltas. – unutbu Apr 04 '18 at 16:31
  • I noticed that 1970 curiosity too. I assume that the timedelta is being turned into a Unix epoch timestamp. But if `now` and `Record.date` are _also_ being turned into Unix epoch timestamps, then subtracting them should give a result somewhere around Jan 1 1970, and I'd expect the arithmetic to work out anyway. (but the arithmetic doesn't work out, so there must be some piece of this puzzle still outstanding) – Kevin Apr 04 '18 at 16:41
  • As an aside, iirc SQLite does not have/support intervals, hence the shenanigans with `timedelta` object bind params. Try issuing the same queries against for example Postgres. – Ilja Everilä Apr 04 '18 at 17:11
  • The behavior is documented under the generic type [`Interval`](http://docs.sqlalchemy.org/en/latest/core/type_basics.html#sqlalchemy.types.Interval). – Ilja Everilä Apr 04 '18 at 17:19

1 Answers1

8

As noted by unutbu, when timedelta objects are used as bind parameters against databases that do not support a native Interval type, they're converted to timestamps relative to "epoch" (1. Jan 1970). SQLite is such a database, as is MySQL. Another notable thing when you turn on logging is that the datetime values are stored and passed as ISO formatted strings.

A DATETIME column has NUMERIC affinity in SQLite, but since the ISO formatted strings cannot be losslessly converted to a numeric value, they retain their TEXT storage class. This on the other hand is fine, since the 3 ways to store date and time data in SQLite are

  • TEXT as ISO8601 strings ("YYYY-MM-DD HH:MM:SS.SSS").
  • REAL as Julian day numbers, the number of days since noon in Greenwich on November 24, 4714 B.C. according to the proleptic Gregorian calendar.
  • INTEGER as Unix Time, the number of seconds since 1970-01-01 00:00:00 UTC.

Things become a bit more interesting when you try to perform arithmetic in the database, though:

In [18]: session.execute('SELECT :date_1 - record.date FROM record',
    ...:                 {"date_1": now}).fetchall()
2018-04-04 20:47:35,045 INFO sqlalchemy.engine.base.Engine SELECT ? - record.date FROM record
INFO:sqlalchemy.engine.base.Engine:SELECT ? - record.date FROM record
2018-04-04 20:47:35,045 INFO sqlalchemy.engine.base.Engine (datetime.datetime(2018, 4, 4, 10, 0),)
INFO:sqlalchemy.engine.base.Engine:(datetime.datetime(2018, 4, 4, 10, 0),)
Out[18]: [(1,), (0,), (0,)]

The reason is that all mathematical operators cast their operands to NUMERIC storage class, even if the resulting values are lossy – or don't make sense for that matter. In this case the year part is parsed and the rest ignored.

Since any INTEGER or REAL value is less than any TEXT or BLOB value, all the comparisons between the resulting integer values and given ISO formatted interval strings are true:

In [25]: session.execute(text('SELECT :date_1 - record.date < :param_1 FROM record')
    ...:                 .bindparams(bindparam('param_1', type_=Interval)),
    ...:                 {"date_1": now, "param_1": delta}).fetchall()
    ...:                 
2018-04-04 20:55:36,952 INFO sqlalchemy.engine.base.Engine SELECT ? - record.date < ? FROM record
INFO:sqlalchemy.engine.base.Engine:SELECT ? - record.date < ? FROM record
2018-04-04 20:55:36,952 INFO sqlalchemy.engine.base.Engine (datetime.datetime(2018, 4, 4, 10, 0), '1970-01-01 08:00:00.000000')
INFO:sqlalchemy.engine.base.Engine:(datetime.datetime(2018, 4, 4, 10, 0), '1970-01-01 08:00:00.000000')
Out[25]: [(1,), (1,), (1,)]

Some might call all this a leaky abstraction, but it'd be a daunting – or impossible – task to provide solutions in SQLAlchemy to all the differences between database implementations. Personally I find it handier that it does not get in the way, but allows using the database's features pretty much as is, but with a nice Python DSL. If you truly need to support time diffs in different databases in a single code base, create a custom construct with suitable database specific compilers.

To actually compute the difference in SQLite and compare against total seconds in a given timedelta you need to use the strftime() function in order to convert the ISO formatted string to seconds since epoch. julianday() would also work, as long as you convert the Python datetime as well and convert the result to seconds. Replace the 2 misbehaving comparisons with for example:

# Not sure if your times were supposed to be UTC or not
now_ts = now.replace(tzinfo=datetime.timezone.utc).timestamp()
delta_s = delta.total_seconds()

# Not quite pretty...
criterion = [
    (now_ts - func.strftime('%s', Record.date) < delta_s,
    (Record.date > now - delta),
    (delta_s > now_ts - func.strftime('%s', Record.date)),
    (now - delta < Record.date),
]
Ilja Everilä
  • 50,538
  • 7
  • 126
  • 127