How to create indexes faster

Question

I have a table of about 60GB and I'm trying to create an index, and its very slow (almost a day, and still running!)

I see most of the time is on Disk I/O(4MB/s), and it doesn't use the memory or cpu so much

I tried: running 'pragma cache_zise = 10000' and 'pragma page_zise=4000' (after I created the table), and its still doesn't help.

How can I make the 'create index' run in reasonable time?

maybe preventing the fsync of sqlite could help, and should not be a problem, if this is a onetime action. — sleeplessnerd, Jul 24 '11 at 16:40
I'm fighting this same thing right now. it's taking forever, and no idea how much longer it would need. (mine is only 12 gigs) — Brady Moritz, Jan 18 '12 at 01:52
related http://stackoverflow.com/questions/15778716/sqlite-insert-speed-slows-as-number-of-records-increases-due-to-an-index — Ciro Santilli OurBigBook.com, Nov 25 '15 at 16:55

score 2 · Answer 1 · answered Jul 24 '11 at 17:01

Creating an index on a database table is a one time operation and it can be expensive based on many factors ranging from how many fields and of what type are included in the index, the size of the data table that is to be indexed, the hardware of the machine the database is running on, and possibly even more.

To give a reasonable answer on speeding things up, we would need to know the schema of the table, the definition of the index you are creating, are you reasonably sure if you are including uniqueness in your index that the data is actually unique, what are the hardware specs of your server, what are your disk speeds, how much available space on the disks, are you using a raid array, what level of raid, how much ram do you have and what is the utilization. etc...

Now all that said, this might be faster but I have not tested it.

make a structurally duplicate table of the table you wish to index. Add the index to the new empty table. copy the data from the old table to the new table in chunks. drop the old table.

My theory is that it will be less expensive to index the data as it is added than to dig through the data that is already there and add the index after the fact.

I don't know the internals of SQLite, but this would definitely not hold true for MS SQL Server. In fact, when bulk loading a table it's often faster to drop all of the indexes, fill the table, then recreate the indexes. Indexes aren't a one-time hit. They have to be maintained, which means that every insert and potentially updates on the table require moving data around in the index. Doing this one row at a time a million times is typically much more costly than running a single sorting algorithm on the data once. — Tom H, Jul 24 '11 at 17:18
Agree, that the index is constantly being adjusted. However SQLite is a file based system more like MS Access than SQL Server or MySQL or POSTGRESQL ... so it may reasonably be faster, but like I said I have not tested the methodology, to provide smaller chunks to digest for the light weight engine available in SQLite. — Bueller, Jul 24 '11 at 17:25

score -2 · Answer 2 · answered Apr 03 '15 at 11:10

-2

When you create table,you should create the index. PS:you should consider the index is properly.and you need not to create the index at runtime.

answered Apr 03 '15 at 11:10

frank

2,327
1
18
20

This is a good suggestion. I'd try exporting the data from your sqlite db to a duplicate empty sqlite db with the indexes already applied and test how long it takes. – TheLegendaryCopyCoder Nov 24 '15 at 11:48
7

This suggestion is bad. If you have any real experience with such big tables, you will know that if you create the index first, you will never finish the inserting... – kawing-chiu Jun 11 '16 at 00:04
1

Agree with kawing-chiu. Creating the index first, then adding data makes the insert time go up many orders of magnitude compared to creating the index after adding a large amount of data. – Russell Trahan Oct 23 '18 at 21:25

How to create indexes faster

2 Answers2