0

I have a simple query on an SQLite database (1.5GB on disk). It seems my loop is very slow

for id, data in conn.execute("select ID, val in data order by ID"):
     # do stuff

despite an index on the ID column. But when I restart the script, the rows that had been read in previous runs, loop in an acceptable speed. Are they somehow cached?

Is there anything I can do to make the loop over the complete data fast?

EDIT: Apparently the orderby/index is the problem. There is a 2000x speed different between looping over rows with and without the order by.

Pragma table_info(data) show

ID: cid=0 type=int notnull=0 dflt_value=null pk=0
val: cid=1 type=int notnull=0 dflt_value=null pk=0

I did no special adjustments to anything.

VACUUM did not improve performance.

My index is:

type=index, tbl_name=data rootpage=3 sql=create index myindex on data (ID)

SOLVED: Creating a covering index (id, val) and then cycling is a huge lot faster. I had no success with other adjustments...

Gere
  • 12,075
  • 18
  • 62
  • 94
  • 1
    Can you add the schema for the table and index to your question? It might help people give an accurate answer. http://stackoverflow.com/questions/4654762/how-i-can-see-the-structure-of-table-in-sqlite may help if you're not sure how to do this. – Duncan Jul 28 '14 at 12:59
  • Copying a 1.5GB file takes minutes from local disk to local disk? If you are not using floppy disks or a computer so old that Win7 won't run on it, that seems _awfully_ slow. I just copied 5GB on a vintage 2005 machine and it took 30 seconds. – msw Jul 28 '14 at 14:52
  • Maybe it was due to a file being on a network drive. To be sure, I'm working on the local drive now. In any case, reading without `order by` has acceptable speed. The massive slow-down for uncached data with `order by` is much too slow. – Gere Jul 28 '14 at 15:03

1 Answers1

2

Your OS caches read data; if your harddisk is slow an initial read will be slower than subsequent runs.

There are a few things you can do to try and speed things up:

  • Run a VACUUM to rebuild the database file; fragmentation and empty space can affect how the data can be read.

  • Tune the page cache size, using pragma cache_size = <page count>. The default is 2000 pages; if you have the memory try setting this to a higher value.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • So there is nothing I can do with SQLite? Is there anything I can do in Windows 7 (without admin rights)? – Gere Jul 28 '14 at 12:02
  • @Gerenuk: You cannot get around the fact that a harddisk is a *slow medium*; that's why your OS uses buffers to cache data once read. – Martijn Pieters Jul 28 '14 at 12:02
  • Copying the whole file takes a few minutes. Merely reading the whole file with my script would take days. Is there nothing I can do? – Gere Jul 28 '14 at 12:19
  • Btw, reading the whole file without the index is fast again. So there is definitely some problem with disk access and the index. – Gere Jul 28 '14 at 12:36
  • @Gerenuk: Has the table been altered a lot? Perhaps you need to run a VACUUM to clear up any fragmentation issues. – Martijn Pieters Jul 28 '14 at 12:38
  • Not really. The table was written once and an index has been generated. I know little about infrastructure issues. Maybe it has to do with random position file access? Not sure what to do... – Gere Jul 28 '14 at 12:40
  • I don't have much memory, but 1.5GB should fit into memory. What would be a sensible value? (I tried some random values, but it didn't do much... :/) – Gere Jul 28 '14 at 15:17
  • @Gerenuk: the default setting for page size is 1k; 2000 pages is 2MB of memory for the page cache plus some overhead. Have you tried values like 50000? – Martijn Pieters Jul 28 '14 at 15:20