1

I have a sqlite3 DB table of around 250 000 rows, my code is written in python. I need to filter it in very specific wat, and it takes much too long time.

Table is as follows:

self.cur.execute("""create table DetectedVehicles(IdD INTEGER PRIMARY KEY, 
                                    CLCode INT, 
                                    DetectionTime INT,
                                    PlateNo VARCHAR)""")

It's a table for automatic plate number recognition result filtering. And I need to filter it to get (native sql-like statements :) ):

Get rows from table DetectedVehicles where vehicles were observed at 
CLCode="X" before they were observed at CLCode="Y". 
(implicite: they were observed at both of them)

So I need to get list of detectedvehicles, that crossed specific CLCodes in proper sequence, i.e. Y before X.

I managed to create something that is working, but it takes about 10seconds for the query. Is there a faster way?

The code goes here:

self.cur.execute('select distinct PlateNo from DetectedVehicles where CLCode=? intersect select PlateNo from DetectedVehicles where CLCode=?',(CountLocationNo[0],CountLocationNo[1]))
    PlatesTab=list(self.cur)
    Results=[]
    for Plate in PlatesTab:
        PlateQ1='select * from DetectedVehicles where PlateNo in (?) and ((select DetectionTime from DetectedVehicles where CLCode = ? and PlateNo in (?) ) <  (select DetectionTime from DetectedVehicles where CLCode = ? and PlateNo in (?)))'     
        R=list(self.cur.execute(PlateQ1,(Plate,CountLocationNo[0],Plate,CountLocationNo[1],Plate)))
        if R:
            TimesOD=self.curST2.execute('select DetectionTime from DetectedVehicles where PlateNo in (?) and (CLCode= ? or CLCode=?)',(Plate,CountLocationNo[0],CountLocationNo[1])).fetchall()
            if TimesOD:
               TravelTimes.append(TimesOD[1][0]-TimesOD[0][0])
               DetectionTimes.append(TimesOD[0][0])
            for i in R:
                Results.append(i[0])
    Results=tuple(Results)
    QueryCL=' intersect select * from DetectedVehicles where IDd in ' + str(Results)

Thanks in advance

  • 1
    Can you add your existing SQL code to the query? Does your table have any indexes on it? –  Mar 23 '12 at 11:39
  • Yes it has indexes - (distinct integer for every new detected vehicle). I'll add queries to my question via edit in a minute. – Intelligent-Infrastructure Mar 23 '12 at 11:50
  • You need indexes on the fields you are using for the JOIN and WHERE, i.e. PlateNo and CLCode. Indexes on any other field will not help. – Ben Mar 23 '12 at 13:14

3 Answers3

2

You can do it all in a single query.

select 
    dv1.PlateNo, dvPoint1.DetectionTime, dvPoint2.DetectionTime
from 
    DetectedVehicles dvPoint1 
    inner join DetectedVehicles dvPoint2
        on dvPoint1.PlateNo = dvPoint2.PlateNo
        and dvPoint1.CLCode = ? and dvPoint2.CLCode = ?
        and dvPoint1.DetectionTime < dvPoint2.DetectionTime

You will want an index on (PlateNo, DetectionTime, CLCode), or (CLCode, PlateNo). Try them both to see which is faster. PlateNo on it's own may do.

Ben
  • 34,935
  • 6
  • 74
  • 113
  • Dear Ben, thank You for your solution. Query itself is quickest, but how can I get to the results? - fetchall() via sqlite in Python takes a lot of time? Why, and how to do it? – Intelligent-Infrastructure Mar 23 '12 at 12:47
  • Add the indexes I recommended to speed up `fetchall()`. The query time is not a realistic measure you need to time the whole thing including the `fetchall()`. – Ben Mar 23 '12 at 13:15
1

Try:

select distinct x.*
from DetectedVehicles x
join DetectedVehicles y
  on x.PlateNo = y.PlateNo and 
     x.DetectionTime < y.DetectionTime
where x.CLCode=? and y.CLCode=?

or:

select x.*
from DetectedVehicles x
where exists
(select 1
 from DetectedVehicles y
 where x.PlateNo = y.PlateNo and 
       x.DetectionTime < y.DetectionTime and
       x.CLCode=? and y.CLCode=?)

I would normally expect the latter query to execute more quickly, but it would be worth running both to check.

  • Dear Mark, thank You for your solution. Query itself is quickest, but how can I get to the results? - fetchall() via sqlite takes a lot of time? Why, and how to do it? – Intelligent-Infrastructure Mar 23 '12 at 12:46
  • I regret that I am not familiar with the code you are using to retrieve the results, and therefore I am unable to advise on how to improve its performance. –  Mar 23 '12 at 12:51
0

Thank You guys for this feedback. I post it as an answer, and present time results:

1. fastest total (query 1.80s, fetchall 0.20s, total: 2s)

select distinct x.*
from DetectedVehicles x
join DetectedVehicles y
  on x.PlateNo = y.PlateNo and 
     x.DetectionTime < y.DetectionTime
where x.CLCode=? and y.CLCode=?

2. (query 1.83s, fetchall 0.19s, total: 2.02s)

select 
    dvPoint1.PlateNo, dvPoint1.DetectionTime, dvPoint2.DetectionTime
from 
    DetectedVehicles dvPoint1 
    inner join DetectedVehicles dvPoint2
        on dvPoint1.PlateNo = dvPoint2.PlateNo
        and dvPoint1.CLCode = ? and dvPoint2.CLCode = ?
        and dvPoint1.DetectionTime < dvPoint2.DetectionTime

3. (query 1.82s, fetchall 1.09s, total: 2.91s)

select x.*
from DetectedVehicles x
where exists
(select 1
 from DetectedVehicles y
 where x.PlateNo = y.PlateNo and 
       x.DetectionTime < y.DetectionTime and
       x.CLCode=? and y.CLCode=?)

So thanks @Mark Bannister for Your answer, I'm going to accept it.

However one issue remains: cur.fetchall() takes enourmously long time.. and I need to get the results, how should I do it ? (For just a 100 of rows it takes around 2 minutes for each of your solutions). Solved issue: download new sqlite.dll to your python/dlls folder ... don't ask me why: Join with Pythons sqlite module is slower than doing it manually

Community
  • 1
  • 1
  • You need to time the whole operation including `cur.fetchall()` to get a realistic answer. Then you need to create indexes to support the query. – Ben Mar 23 '12 at 13:13