0

How to get the rows that have a datetime column with < 2 hours difference, with Python Sqlite sqlite3 module?

I tried this:

import sqlite3, datetime
db = sqlite3.connect(':memory:', detect_types=sqlite3.PARSE_DECLTYPES)

c = db.cursor()
c.execute('CREATE TABLE mytable (id integer, date timestamp)')
c.execute('INSERT INTO mytable VALUES (1, ?)', (datetime.datetime(2018,1,1,23,0),))
c.execute('INSERT INTO mytable VALUES (2, ?)', (datetime.datetime(2018,1,2,0,0),))
c.execute('INSERT INTO mytable VALUES (3, ?)', (datetime.datetime(2018,1,9,0,0),))

This query works:

c.execute('SELECT mt1.date, mt2.date FROM mytable mt1, mytable mt2')

and returns:

(datetime.datetime(2018, 1, 1, 23, 0), datetime.datetime(2018, 1, 1, 23, 0))
(datetime.datetime(2018, 1, 1, 23, 0), datetime.datetime(2018, 1, 2, 0, 0))
(datetime.datetime(2018, 1, 1, 23, 0), datetime.datetime(2018, 1, 9, 0, 0))
...
(datetime.datetime(2018, 1, 9, 0, 0), datetime.datetime(2018, 1, 9, 0, 0))

but the datetime difference computation doesn't work:

c.execute('SELECT ABS(mt1.date - mt2.date) FROM mytable mt1, mytable mt2')

(0,)
(0,)
...

so it's finally impossible to use a query with WHERE ABS(mt1.date - mt2.date) < 2 to filter by 2-hours max datetime difference.

How to do this?

Note:

  • detect_types=sqlite3.PARSE_DECLTYPES ensures that the query returns a datetype Python object, and this is working.

  • this works to test if the 2 datetimes are the same day, thanks to DATE function:

    SELECT mt1.*, mt2.* FROM mytable mt1, mytable mt2 WHERE DATE(mt1.date) = DATE(mt2.date)
    
Basj
  • 41,386
  • 99
  • 383
  • 673
  • What does `c.execute('SELECT (mt1.date - mt2.date) FROM mytable mt1, mytable mt2')` return (without the `ABS`)? ` – pault Apr 16 '18 at 19:55
  • @pault It returns `0` as well. – Basj Apr 16 '18 at 19:57
  • @Basj: That's because casting an SQLite date/time string to a numeric value simply gives the year, which is 2018 for all of the rows in your table (so the difference is 0). As you've already discovered, you need to use the `JULIANDAY` function to get a meaningful answer. – dan04 Apr 16 '18 at 21:08
  • Thanks @dan04. Being curious: isn't there a Unix timestamp function that would allow to do `ABS(UNIXTIMESTAMP(mt1.date) - UNIXTIMESTAMP(mt2.date)) < 7200` ? – Basj Apr 16 '18 at 21:09
  • @Basj: Yes, it's spelled `STRFTIME('%s', the_date)`. – dan04 Apr 16 '18 at 21:10
  • Thanks @dan04, I'll update the answer! I wonder which one between `STRFTIME("%s", the_date)` (integer) and `JULIANDAY` (float) is the more efficient, if many many rows to compute. – Basj Apr 16 '18 at 21:13

1 Answers1

1

This works thanks to JULIANDAY which allows to compute a datetime difference as a float:

SELECT ABS(JULIANDAY(mt1.date) - JULIANDAY(mt2.date)) FROM mytable mt1, mytable mt2

So a 2-hours-difference can be translated into this condition:

ABS(JULIANDAY(mt1.date) - JULIANDAY(mt2.date)) < 0.083333

since 2hrs/24hrs = 1/12 ~ 0.083333


This query works as well:

SELECT ABS(STRFTIME("%s", mt1.date) - STRFTIME("%s", mt2.date)) FROM mytable mt1, mytable mt2

and the 2-hours condition would be:

ABS(STRFTIME("%s", mt1.date) - STRFTIME("%s", mt2.date)) < 7200

i.e. a 7200 seconds max difference (STRFTIME("%s", mt1.date) gives the Unix timestamp).

Basj
  • 41,386
  • 99
  • 383
  • 673
  • FWIW, `ABS(STRFTIME("%s", mt1.date) - STRFTIME("%s", mt2.date)) < 7200` seems to be a little bit slower than the `JULIANDAY` condition. Does anyone know how to optimize these queries? – Basj Apr 16 '18 at 21:20
  • Sidenote: Here is a way to massively improve the performance of the query (by a factor 50): https://stackoverflow.com/q/49887709/1422096 – Basj Apr 17 '18 at 21:27