0

I have two tables in my sqlite database t1 and t2. t1 has two columns a and b. t2 has one column a same as in t1. I've build indexes for every column in both tables. I want to select all rows from table t1 where t1.a exists in table t2. So I've written a query:

select t1.a, t1.b from t1 where t1.a in(select t2.a from t2) limit 10000000;

I'm using limit to select all rows, not first 100.

Table t1 contains 6 000 000 rows, table t2 100 000 rows. This query executes 1400ms.

But when I try to select only t1.a with this query:

select t1.a from t1 where t1.a in(select t2.a from t2) limit 10000000;

it tooks only 86ms.

Explain query plan:

SEARCH TABLE csv USING COVERING INDEX iin_idx (iin=?)
USING INDEX sqlite_autoindex_input_1 FOR IN-OPERATOR

Questions:

  1. Why does sqlite behaves so?
  2. What is the difference in selecting one column or several?
  3. Is it possible to speed up this query?
  4. Why does sqlite uses autoindex instead of created one?

I want to populate t1 table with 155 000 000 rows, so the query time will grow significantly.

marco.m
  • 4,573
  • 2
  • 26
  • 41

2 Answers2

0

One possibility is disk caching. Reading from disk is slow, particularly if you have a hard disk drive (HDD) and not a solid-state drive (SSD) The first time you run a query the cache is "cold" and the data must be loaded from disk into memory. The second time you run the query the data is already cached in memory and will be much faster.

Try running the queries a few times and see if the performance changes.

Schwern
  • 153,029
  • 25
  • 195
  • 336
  • I've noticed caching. I've run all queries on cold database. Also I have a SSD disk. – pingvincible Jul 13 '20 at 07:22
  • @pingvincible Then the next possibility is that t1.b is much larger than t1.a, [as mkrieger1 suggested](https://stackoverflow.com/questions/62870786/slow-sqlite-select-behaviour-when-trying-to-select-several-rows/62870948?noredirect=1#comment111178706_62870786). – Schwern Jul 13 '20 at 07:25
0

I think that I've found a solution! I've created an additional index:

create index if not exists ab_b_idx on t1(a, b);

After that selecting both a and b columns runs as fast as selecting only a column.

Explanation of this behaviour I've found in documentation Covering indices:

If, however, all columns that were to be fetched from the table are already available in the index itself, SQLite will use the values contained in the index and will never look up the original table row. This saves one binary search for each row and can make many queries run twice as fast.

Update: Search on 100 000 000 rows in table t1 took 550ms.