Slow Sqlite Select behaviour when trying to select several rows

Question

I have two tables in my sqlite database t1 and t2. t1 has two columns a and b. t2 has one column a same as in t1. I've build indexes for every column in both tables. I want to select all rows from table t1 where t1.a exists in table t2. So I've written a query:

select t1.a, t1.b from t1 where t1.a in(select t2.a from t2) limit 10000000;

I'm using limit to select all rows, not first 100.

Table t1 contains 6 000 000 rows, table t2 100 000 rows. This query executes 1400ms.

But when I try to select only t1.a with this query:

select t1.a from t1 where t1.a in(select t2.a from t2) limit 10000000;

it tooks only 86ms.

Explain query plan:

SEARCH TABLE csv USING COVERING INDEX iin_idx (iin=?)
USING INDEX sqlite_autoindex_input_1 FOR IN-OPERATOR

Questions:

Why does sqlite behaves so?
What is the difference in selecting one column or several?
Is it possible to speed up this query?
Why does sqlite uses autoindex instead of created one?

I want to populate t1 table with 155 000 000 rows, so the query time will grow significantly.

Apparently you're working with a CSV file? What does column b contain? Is it much bigger than what column a contains? — mkrieger1, Jul 13 '20 at 07:12
Did you run the queries once and in that order? If so, try the first one again. It might simply be disk caching. — Schwern, Jul 13 '20 at 07:13
Please create a [mre] that allows us to observe the same behavior. — mkrieger1, Jul 13 '20 at 07:25
@mkrieger1 no, column a is 12 symbols, column b is 11 symbols. — pingvincible, Jul 13 '20 at 07:27

score 0 · Answer 1 · answered Jul 13 '20 at 07:17

0

One possibility is disk caching. Reading from disk is slow, particularly if you have a hard disk drive (HDD) and not a solid-state drive (SSD) The first time you run a query the cache is "cold" and the data must be loaded from disk into memory. The second time you run the query the data is already cached in memory and will be much faster.

Try running the queries a few times and see if the performance changes.

answered Jul 13 '20 at 07:17

Schwern

153,029
25
195
336

I've noticed caching. I've run all queries on cold database. Also I have a SSD disk. – pingvincible Jul 13 '20 at 07:22
@pingvincible Then the next possibility is that t1.b is much larger than t1.a, [as mkrieger1 suggested](https://stackoverflow.com/questions/62870786/slow-sqlite-select-behaviour-when-trying-to-select-several-rows/62870948?noredirect=1#comment111178706_62870786). – Schwern Jul 13 '20 at 07:25

pingvincible · Accepted Answer · 2020-07-13T08:59:34.347

I think that I've found a solution! I've created an additional index:

create index if not exists ab_b_idx on t1(a, b);

After that selecting both a and b columns runs as fast as selecting only a column.

Explanation of this behaviour I've found in documentation Covering indices:

If, however, all columns that were to be fetched from the table are already available in the index itself, SQLite will use the values contained in the index and will never look up the original table row. This saves one binary search for each row and can make many queries run twice as fast.

Update: Search on 100 000 000 rows in table t1 took 550ms.

Slow Sqlite Select behaviour when trying to select several rows

2 Answers2