3

I'm unsure how I can accomplish filtering my database using only the time field. Right now I have a class called DatabasePolgygon

class DatabasePolygon(dbBase):
    __tablename__ = 'objects'

    begin_time = Column(DateTime) # starting time range of shape
    end_time = Column(DateTime) # ending time range of shape
    # Other entries not relevant to this question

begin_time, and end_time may be equal to such values as 2006-06-01 14:45:23, they represent the X-axis range that an object(in this case a shape over a plot) covers. I want to allow advanced searching for my users, specifically asking for all objects that appear within a range of time. How can I accomplish this with the DateTime field however?

        # Grab all shapes that appear above this certain time
        query_result = query_result.filter(
            DatabasePolygon.begin_time >= datetime.strptime(rng['btime']), %H:%M:%S')
        )

The problem is i'm comparing a datetime object with a Y-m-d H-M-S to an object with only a H-M-S. An example scenario would be if a user wants all objects, regardless of year/month/day, that appear beyond the range of 14:45:24, so we would have rng['btime']=14:45:24 and begin_time=2006-06-01 14:45:23 which doesn't seem to actually filter anything when compared.

Is there some way I can efficiently compare times within this column of data? I'd love to be able to do something like

        # Grab all shapes that appear above this certain time
        query_result = query_result.filter(
            DatabasePolygon.begin_time.time() >= datetime.strptime(rng['btime']), %H:%M:%S').time()
        )
Syntactic Fructose
  • 18,936
  • 23
  • 91
  • 177
  • The way that the underlying DB index works on a datetime column means that there is no efficient way to do this, unless some very specific things are true. E.g. if your data spans a small number of days, then you could do a separate efficient query for each day's time range. Beyond that, I think your best bet is to pull all the data to python and then filter. Some databases support functional indexes - that would allow you to do what you want but I don't know if sqlalchemy supports them. – Tom Dalton Jan 17 '16 at 23:42
  • Further to comment from @Tom re: indexes - SQLite added support for indexes on expressions in version 3.9.0 (ref: [here](https://www.sqlite.org/expridx.html)) so if SQLAlchemy produces appropriate SQL queries then SQLite should be able to handle them efficiently. – Gord Thompson Jan 18 '16 at 00:41
  • @GordThompson Not quite sure how to make sense of indexes in this case, could you provide some code if you plan on answering? – Syntactic Fructose Jan 18 '16 at 00:48
  • 1
    https://www.sqlite.org/expridx.html - in your case, the expression is converting the datetime column to the time part only (which is then indexed). – Tom Dalton Jan 18 '16 at 01:02

1 Answers1

3

It appears to be possible, subject to a few conditions.

 
Objective 1: Doing it (at all).

With a class named Thing to hold the id and begin_time values from the "objects" table:

class Thing(Base):
    __tablename__ = 'objects'

    id = Column(Integer, primary_key=True)
    begin_time = Column(DateTime)

    def __repr__(self):
       return "<Thing(id=%d, begin_time='%s')>" % (self.id, self.begin_time)

and test data in the "objects" table of the SQLite database

id  begin_time
--  -------------------
 1  1971-01-14 17:21:53
 2  1985-05-24 10:11:12
 3  1967-07-01 13:14:15

this, unfortunately, does not work:

engine = create_engine(r'sqlite:///C:\__tmp\test.db', echo=True)

Session = sessionmaker(bind=engine)
session = Session()
for instance in session.query(Thing)\
        .filter(Thing.begin_time[11:]<'17:00:00')\
        .order_by(Thing.id):
    print(instance)

producing

NotImplementedError: Operator 'getitem' is not supported on this expression

However, this does work ...

engine = create_engine(r'sqlite:///C:\__tmp\test.db', echo=True)

conn = engine.connect()
result = conn.execute("SELECT id FROM objects WHERE substr(begin_time,12)<'17:00:00'")
id_list = [row[0] for row in result.fetchall()]
result.close()
conn.close()

Session = sessionmaker(bind=engine)
session = Session()
for instance in session.query(Thing)\
        .filter(Thing.id.in_(id_list))\
        .order_by(Thing.id):
    print(instance)

 
Objective 2: Doing it efficiently.

The console output shows us that the first SELECT is indeed

SELECT id FROM objects WHERE substr(begin_time,12)<'17:00:00'

so if we were using SQLite 3.9.0 or later and had created an "index on expression"

CREATE INDEX time_idx ON objects(substr(begin_time,12));

then SQLite would be able to avoid a table scan. Unfortunately, even the latest release of CPython 2.7 at the moment (2.7.11) still ships with a sqlite3 module that is too old

Python 2.7.11 (v2.7.11:6d1b6a68f775, Dec  5 2015, 20:32:19) [MSC v.1500 32 bit (Intel)] on win32
>>> import sqlite3
>>> sqlite3.sqlite_version
'3.6.21'

so that index cannot exist in the database or SQLAlchemy will choke on it:

sqlalchemy.exc.DatabaseError: (sqlite3.DatabaseError) malformed database schema (time_idx) - near "(": syntax error [SQL: "SELECT id FROM objects WHERE substr(begin_time,12)<'17:00:00'"]

So, if the "efficiently" part is really important then you may need to convince Python to use a more current version of SQLite. Some guidance on that may be found in the question

Force Python to forego native sqlite3 and use the (installed) latest sqlite3 version

Community
  • 1
  • 1
Gord Thompson
  • 116,920
  • 32
  • 215
  • 418