0

I had assumed that the time cost of insertion is log to the number of records. But my test in SQLite 3.22 SQLite insertion cost seems showing it is linear. Note both X/Y are in log scale.

The size(k) column is the number of rows I inserted at each test. Its unit is K. I did 3 tests. Journal and synchronous are off. Locking_mode is exclusive. All operations are included in one transaction.

time1

create table t1 (id primary key, name text);
create index nameIdx on t1(name)
// for i = [1:<size>]
//   insert into t1 values(i, "foo"i)
create table t2 (id primary key, value int);
// for i = [1:<size>]
//   insert into t2 values(i, i)

time2

create table t1 (id primary key, name text);
// for i = [1:<size>]
//   insert into t1 values(i, "foo"i)
create index nameIdx on t1(name)
create table t2 (id primary key, value int);
// for i = [1:<size>]
//   insert into t2 values(i, i)

time3

create table t1 (id primary key, name text, value int);
// for i = [1:<size>]
//   insert into t1 values(i, "foo"i, 0)
create index nameIdx on t1(name)
// for i = [1:<size>]
//    update t1 set value=0 where id=<i>

All the 3 test cases have similar costs. They seem linear. Also I had thought case 3 can be faster because update does not need rebalance tree or add new records. But case3 is a little bit slower...

Are the results expected? Maybe my input data are too small to see the log complexity?

Joe C
  • 2,757
  • 2
  • 26
  • 46

1 Answers1

1

SQLite optimizes inserts at the end of the table. sqlite3BtreeInsert() in btree.c has:

/* If the cursor is currently on the last row and we are appending a
** new row onto the end, set the "loc" to avoid an unnecessary
** btreeMoveto() call */

To get worse run time, try inserting the rows in random order, or at least inserting the a very large value first.

Anyway, the run time is dominated by disk I/O, and the non-leaf pages are most likely to be cached. Use more transactions (so that all pages need to be flushed to disk), or use an in-memory database.

CL.
  • 173,858
  • 17
  • 217
  • 259
  • When inserting ordered data, split/rebalance happens at non-leaf nodes. When inserting random data, split/rebalance happens more at leaf nodes. Because disk-IO dominates, so random data can show log complexity more likely. Is this how this works intuitively? – Joe C Feb 14 '18 at 18:23
  • First, measure how it actually behaves. – CL. Feb 14 '18 at 18:28