0

I'm currently learning SQLite (called by Python).

According to my previous question (Reorganising Data in SQLLIte), I want to store multiple time series (Training data) in my database.
I have defined the following fields:

CREATE TABLE VARLIST 
(
     VarID INTEGER PRIMARY KEY AUTOINCREMENT, 
     name TEXT UNIQUE NOT NULL
)    

CREATE TABLE DATAPOINTS 
(
     DataID INTEGER PRIMARY KEY, 
     timeID INTEGER, 
     VarID INTEGER, 
     value REAL
)

CREATE TABLE TIMESTAMPS 
(
     timeID INTEGER PRIMARY KEY AUTOINCREMENT, 
     TRAININGS_ID INT, 
     TRAINING_TIME_SECONDS FLOAT
)

VARLIST has 8 entries, TIMESTAMPS 1e5 entries and DATAPOINTS around 5e6.

When I now want to extract data for a given TrainingsID and VarID, I try it like:

SELECT             
    (SELECT TIMESTAMPS.TRAINING_TIME_SECONDS 
     FROM TIMESTAMPS 
     WHERE t.timeID = timeID) AS TRAINING_TIME_SECONDS,
    (SELECT value 
     FROM DATAPOINTS 
     WHERE DATAPOINTS.timeID = t.timeID and DATAPOINTS.VarID = 2) as value
FROM 
    (SELECT timeID  
     FROM TIMESTAMPS 
     WHERE TRAININGS_ID = 96) as t;

The command EXPLAIN QUERY PLAN delivers:

0|0|0|SCAN TABLE TIMESTAMPS
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE TIMESTAMPS USING INTEGER PRIMARY KEY (rowid=?)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 2
2|0|0|SCAN TABLE DATAPOINTS

This basically works.
But there are two problems:

  1. Minor problem: If there is a timeID where no data for the requested VarID is availabe, I get an line with the valueNone`.
    I would prefer this line to be skipped.

  2. Big problem: the search is incredibly slow (approx 5 minutes using http://sqlitebrowser.org/).

How do I best improve the performance?
Are there better ways to formulate the SELECT command, or should I modify the database structure itself?

Community
  • 1
  • 1
BerndGit
  • 1,530
  • 3
  • 18
  • 47
  • Looks like you are doing a cross join. Those are usually pretty slow for datasets of the size you mentioned – OneCricketeer Jan 09 '16 at 19:15
  • Would it help in performance if i restructure the database so that the data from 'timestamps' is directly mapped into 'datapoints'? (my initial desicib to split this in 2 tables was mainly motivated by reduction on data size. This is of less importance than performance) – BerndGit Jan 09 '16 at 21:42
  • 1
    Show the output of [EXPLAIN QUERY PLAN](http://www.sqlite.org/eqp.html). – CL. Jan 09 '16 at 22:19
  • 1
    Index your JOIN and your WHERE fields. – Phantômaxx Jan 10 '16 at 08:29

1 Answers1

1

Ok, based on the hints I have got I could extremly accelerate the search by applieng INDEXES as:

CREATE INDEX IF NOT EXISTS DP_Index on DATAPOINTS (VarID,timeID,DataID);
CREATE INDEX IF NOT EXISTS TS_Index on TIMESTAMPS(TRAININGS_ID,timeID);

The EXPLAIN QUERY PLAN output now reads as:

0|0|0|SEARCH TABLE TIMESTAMPS USING COVERING INDEX TS_Index (TRAININGS_ID=?)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 1
1|0|0|SEARCH TABLE TIMESTAMPS USING INTEGER PRIMARY KEY (rowid=?)
0|0|0|EXECUTE CORRELATED SCALAR SUBQUERY 2
2|0|0|SEARCH TABLE DATAPOINTS USING INDEX DP_Index (VarID=? AND timeID=?)

Thanks for your comments.

BerndGit
  • 1,530
  • 3
  • 18
  • 47