0

I've narrowed a performance issue to a particular SQLite query that looks like this:

select *
  from test
 where (?1 is null or ident = ?1)
   and (?2 is null or name = ?2)
   and (?3 is null or region = ?3);

This allows any subset of the input parameters (there are more than three) with a single query. Unfortunately, using explain query plan on this yields:

1|0|0|SCAN TABLE test

So SQLite is reading through the entire table no matter what's passed in.

Changing the query to from table indexed by test_idx causes it to fail: Error: no query solution.

Removing the ?1 is null or yields a much more favorable query:

1|0|0|SEARCH TABLE test USING INDEX idx (ident=?)

However, note that only one index can be used. All matches for ident will be scanned looking for matches to other fields. Using a single index that contains all the match fields avoids this:

0|0|0|SEARCH TABLE test USING INDEX test_idx_3 (ident=? AND region=? AND name=?)

It seems reasonable to think that SQLite's query planner would be able to either eliminate or simplify each condition to a simple indexed column check, but apparently that is not the case, as query optimization happens before parameter binding, and no further simplification occurs.

The obvious solution, is to have 2^N separate queries and select the appropriate one at runtime based on which combination of inputs are to be checked. For N=2 or 3 that might be acceptable, but it's absolutely out of the question in this case.

There are, of course, a number of ways to re-organize the database that would make this type of query more reasonable, but assume that's also not practical.

So, how can I search any subset of columns in a table without losing the performance benefit of indexes on those columns?

Tim Sylvester
  • 22,897
  • 2
  • 80
  • 94
  • Question and answer the same time? – Lukasz Szozda Nov 07 '15 at 21:34
  • @lad2025 yeah, it's always been encouraged and I'm not sure when it was added, but now there's an option to compose them at the same time. [http://stackoverflow.com/help/self-answer](http://stackoverflow.com/help/self-answer) – Tim Sylvester Nov 07 '15 at 22:34
  • I know you can answer own question. But then I don't know you need help or not – Lukasz Szozda Nov 07 '15 at 22:36
  • @TimSylvester I'm a little unclear on what exactly are you binding here. If it's a literal, then wouldn't you know beforehand if it is null or not making the `?1 is null` part unnecessary? Are the placeholders meant to be replaced by column names? – Anurag Nov 08 '15 at 04:33

2 Answers2

0

The only progress I've been able to make is to use a query like this:

select ident, name, region
  from test
 where (case when ?1 is null then 1 when ident  = ?1 then 1 else 0 end)
   and (case when ?2 is null then 1 when name   = ?2 then 1 else 0 end)
   and (case when ?3 is null then 1 when region = ?3 then 1 else 0 end)

This reduces the query to an index scan, rather than a table scan:

0|0|0|SCAN TABLE test USING COVERING INDEX test_idx_3

However, it only works if there's one index containing all the columns of interest, and if the only columns being selected are those in the index. If the index isn't a "covering index" (one containing all the needed values) then SQLite doesn't use the index at all.

The way to get around the second restriction is to structure the query like this:

select ident, name, region, location
  from test
 where rowid in (
       select rowid
         from test
        where (case when ?1 is null then 1 when ident  = ?1 then 1 else 0 end)
          and (case when ?2 is null then 1 when name   = ?2 then 1 else 0 end)
          and (case when ?3 is null then 1 when region = ?3 then 1 else 0 end)
       )

yielding:

0|0|0|SEARCH TABLE test USING INTEGER PRIMARY KEY (rowid=?)
0|0|0|EXECUTE LIST SUBQUERY 1
1|0|0|SCAN TABLE test USING COVERING INDEX test_idx_3

This is generally faster than a full table scan, but how much faster depends on several factors:

  • How much data is in each row that's not in the index? If it's small, then the index scan is almost a table scan.

  • How many results are there? Each result is a separate primary key search, so for some large number of results in a large table, the N searches will actually be slower than a single pass through the whole table. For M results in a table of N rows, you want O[M log N] << O[N], so m < (N / log N). Call it 3% as a rule of thumb, minus the cost of an index scan:

n/nlogn

Tim Sylvester
  • 22,897
  • 2
  • 80
  • 94
0

Don't try to be clever. SQLite's prepared statements do not need much memory, so you actually could keep all 2^N of them. But preparing a query does not need much time, either, so it would be a better idea to construct each query dynamically whenever you need it.

As for the index: the documentation shows that the leftmost columns in the index must be used in the query. This means that you only need a few combinations of columns in your indexes (even for queries that do not use all index columns). In any case, you should prioritize indexes on columns with high selectivity.

Community
  • 1
  • 1
CL.
  • 173,858
  • 17
  • 217
  • 259