SQLite select speed varies in Python?

Question

I have a simple query on an SQLite database (1.5GB on disk). It seems my loop is very slow

for id, data in conn.execute("select ID, val in data order by ID"):
     # do stuff

despite an index on the ID column. But when I restart the script, the rows that had been read in previous runs, loop in an acceptable speed. Are they somehow cached?

Is there anything I can do to make the loop over the complete data fast?

EDIT: Apparently the orderby/index is the problem. There is a 2000x speed different between looping over rows with and without the order by.

Pragma table_info(data) show

ID: cid=0 type=int notnull=0 dflt_value=null pk=0
val: cid=1 type=int notnull=0 dflt_value=null pk=0

I did no special adjustments to anything.

VACUUM did not improve performance.

My index is:

type=index, tbl_name=data rootpage=3 sql=create index myindex on data (ID)

SOLVED: Creating a covering index (id, val) and then cycling is a huge lot faster. I had no success with other adjustments...

Can you add the schema for the table and index to your question? It might help people give an accurate answer. http://stackoverflow.com/questions/4654762/how-i-can-see-the-structure-of-table-in-sqlite may help if you're not sure how to do this. — Duncan, Jul 28 '14 at 12:59
Copying a 1.5GB file takes minutes from local disk to local disk? If you are not using floppy disks or a computer so old that Win7 won't run on it, that seems _awfully_ slow. I just copied 5GB on a vintage 2005 machine and it took 30 seconds. — msw, Jul 28 '14 at 14:52
Maybe it was due to a file being on a network drive. To be sure, I'm working on the local drive now. In any case, reading without `order by` has acceptable speed. The massive slow-down for uncached data with `order by` is much too slow. — Gere, Jul 28 '14 at 15:03

Martijn Pieters · Answer 1 · 2014-07-28T12:45:20.567

2

Your OS caches read data; if your harddisk is slow an initial read will be slower than subsequent runs.

There are a few things you can do to try and speed things up:

Run a VACUUM to rebuild the database file; fragmentation and empty space can affect how the data can be read.
Tune the page cache size, using pragma cache_size = <page count>. The default is 2000 pages; if you have the memory try setting this to a higher value.

edited Jul 28 '14 at 12:45

answered Jul 28 '14 at 12:00

Martijn Pieters

1,048,767
296
4,058
3,343

So there is nothing I can do with SQLite? Is there anything I can do in Windows 7 (without admin rights)? – Gere Jul 28 '14 at 12:02
@Gerenuk: You cannot get around the fact that a harddisk is a *slow medium*; that's why your OS uses buffers to cache data once read. – Martijn Pieters Jul 28 '14 at 12:02
Copying the whole file takes a few minutes. Merely reading the whole file with my script would take days. Is there nothing I can do? – Gere Jul 28 '14 at 12:19
Btw, reading the whole file without the index is fast again. So there is definitely some problem with disk access and the index. – Gere Jul 28 '14 at 12:36
@Gerenuk: Has the table been altered a lot? Perhaps you need to run a VACUUM to clear up any fragmentation issues. – Martijn Pieters Jul 28 '14 at 12:38
Not really. The table was written once and an index has been generated. I know little about infrastructure issues. Maybe it has to do with random position file access? Not sure what to do... – Gere Jul 28 '14 at 12:40
I don't have much memory, but 1.5GB should fit into memory. What would be a sensible value? (I tried some random values, but it didn't do much... :/) – Gere Jul 28 '14 at 15:17
@Gerenuk: the default setting for page size is 1k; 2000 pages is 2MB of memory for the page cache plus some overhead. Have you tried values like 50000? – Martijn Pieters Jul 28 '14 at 15:20

SQLite select speed varies in Python?

1 Answers1