Sqlite appending data performance linear degradation, is this solvable?

Question

I have a test set up to write rows to a database. Each transaction inserts 10,000 rows, no updates. Each step takes a linear time longer then the last. The first ten steps took the following amount of time in ms to perform a commit

568, 772, 942, 1247, 1717, 1906, 2268, 2797, 2922, 3816, 3945

By the time it reaches adding 10,00 rows to a table of 500,000 rows, it takes 37149 ms to commit!

I have no foreign key constraints.
I have found using WAL, improves performance (gives figures above), but still linear degradation
PRAGMA Synchronous=OFF has no effect
PRAGMA locking_mode=EXCLUSIVE has no effect
Ran with no additional indexes and additional indexes. Made a roughly constant time difference, so was still a linear degradation.

Some other settings I have

setAutocommit(false)
PRAGMA page_size = 4096
PRAGMA journal_size_limit = 104857600
PRAGMA count_changes = OFF
PRAGMA cache_size = 10000
Schema has Id INTEGER PRIMARY KEY ASC, insertion of which is incremental and generated by Sqlite

Full Schema as follows (I have run both with and without indexes, but have included)

create table if not exists [EventLog] (
Id INTEGER PRIMARY KEY ASC, 
DocumentId TEXT NOT NULL, 
Event TEXT NOT NULL, 
Content TEXT NOT NULL, 
TransactionId TEXT NOT NULL, 
Date INTEGER NOT NULL, 
User TEXT NOT NULL)

create index if not exists DocumentId ON EventLog (DocumentId)

create index if not exists TransactionId ON EventLog (TransactionId)

create index if not exists Date ON EventLog (Date)

This is using sqlite-jdbc-3.7.2 running in a windows environment

Please show your entire database schema (`.schema` in the command-line tool). — CL., Oct 02 '12 at 09:28
This is the schema as generated by my test platform, I hope thats sufficient, will look into command-line tool — J Pullar, Oct 02 '12 at 10:08
some leads on sqlite optim http://stackoverflow.com/questions/1711631/how-do-i-improve-the-performance-of-sqlite — njzk2, Oct 02 '12 at 10:15
also, if your database is to be created once, create the index at the end of the import — njzk2, Oct 02 '12 at 10:16
re-indexing does show some improvement, but still a linear degradation — J Pullar, Oct 16 '12 at 14:34

CL. · Accepted Answer · 2012-10-02T10:49:57.977

3

SQLite tables and indexes are internally organized as B-Trees. In tables, the Rowid is the sorting key. (Your INTEGER PRIMARY KEY is the Rowid.)

If your inserted IDs are not larger than the largest ID already in the table, then the records are not appended, but inserted somewhere in the middle of the tree. When inserting enough records in one transaction, and if the distribution of IDs is random, this means that almost every page in the database must be rewritten.

To avoid this,

insert the IDs in increasing order; or
insert the IDs as NULL so that SQLite chooses the next value; or
prevent SQLite from using your ID field a Rowid by declaring it as INTEGER UNIQUE (or just INTEGER if you don't need the extra check/index), thus making the table ordering independent of your ID.

In the case of indexes, inserting an indexed field with a random distribution requires that the index is updated at a random position. Like with tables, when inserting enough records in one transaction, this means that almost every page in the index must be rewritten.

When you're loading large amounts of data, it is recommended to do this without any indexes and to recreate them afterwards. (Unlike some other databases, SQLite has no function to temporarily disable indexes; just drop them.)

edited Oct 02 '12 at 10:49

answered Oct 01 '12 at 18:25

CL.

173,858
17
217
259

Thanks for your answer, unfortunately this was the case already, I have updated my question accordingly – J Pullar Oct 02 '12 at 09:06
i suppose that, the ids being generated and the data being inserted, they are incremented – njzk2 Oct 02 '12 at 10:10
Updated; you have the same problem, but with indexes. – CL. Oct 02 '12 at 10:51
I have extracted code and have verified that it is entirely down to indexes, in particular in the case that a row has its own custom id such as a guid – J Pullar Oct 02 '12 at 16:03
I have implemented a drop and recreate index approach which still shows linear degradation, though at a slower rate – J Pullar Oct 02 '12 at 16:30

score 0 · Answer 2 · answered Oct 16 '12 at 14:37

FYI, although I haven't limited the structure in terms of the content of the key, in 99.999% of cases, it will be a guid. So to resolve the performance issue I just wrote an algorithm for generating sequential guids using a time based value for the first 8 hex digits. This worked very well, even if blocks of guids are generated using early time values.

Sqlite appending data performance linear degradation, is this solvable?

2 Answers2

Linked