1

As a sort of exercise, I'm seeing how fast I can insert bulk records into SQLite. The data set is about 50MB and contains 1M rows. Here is what I currently have:

sqlite3 *db;
int rc = sqlite3_open("MyDB.db", &db);
sqlite3_exec(db, "BEGIN TRANSACTION", NULL, NULL, NULL);
char* sql_buffer = malloc(200 * sizeof(char));
for (int i=0; item=row[i]; i ++) {
    snprintf(sql_buffer, 200, "insert into myTable (id, format, size) VALUES (%d, '%s', %d)", item.id, item.format, item.size);
    rc = sqlite3_exec(db, sql_buffer, NULL, NULL, NULL);
}
sqlite3_exec(db, "COMMIT TRANSACTION", NULL, NULL, NULL);

Doing the above 1M inserts, it takes 3.39s. About 90% of that time is the SQLite inserts and 10% is the snprintf function. I tried the following to see if it would increase speed:

  • Doing inserts after every 10K, 50K, 100K, instead of at the end (1M)
  • Writing to memory instead of a file.
  • Changing various pragmas, for example: PRAGMA cache_size = 400000; PRAGMA synchronous = OFF; PRAGMA journal_mode = OFF;...

None of those seemed to have more than any effect of more than a 0.1s difference or so.

Are there any further ways that I could increase the insert speed here? If we assume the file is "parsed" and cannot just be loaded directly from something like a csv file, could it theoretically be possible to insert 1M rows in under 1s? If not, what is the limitation in doing something like that?

  • 1
    https://stackoverflow.com/questions/1711631/improve-insert-per-second-performance-of-sqlite is old but good. – Shawn Sep 25 '19 at 02:19

1 Answers1

0

Appreciate that with your current approach, inserting 1 million rows would require executing 1 million separate round trip inserts to SQLite. Instead, you could try using one of the following two approaches. For more recent versions of SQLite:

INSERT INTO myTable (id, format, size)
VALUES
    (%d, '%s', %d),
    (%d, '%s', %d),
    (%d, '%s', %d),
    ... (more rows)

For earlier versions of SQLite, you may use an INSERT INTO ... SELECT construct:

INSERT INTO myTable (id, format, size)
SELECT %d, '%s', %d UNION ALL
SELECT %d, '%s', %d UNION ALL
... (more rows)

The basic idea here is that you can try just making a single insert call to SQLite with all of your data, instead of inserting one row at a time.

Not a C person, but here is how you might build the insert string from your C code:

const int MAX_BUF = 1000;  // make this as large as is needed
char* sql_buffer = malloc(MAX_BUF * sizeof(char));
int length = 0;
length += snprintf(sql_buffer+length, MAX_BUF-length, "INSERT INTO myTable (id, format, size) VALUES");
for (int i=0; item=row[i]; i++) {
    length += snprintf(sql_buffer+length, MAX_BUF-length, " (%d, '%s', %d)", item.id, item.format, item.size);
}

rc = sqlite3_exec(db, sql_buffer, NULL, NULL, NULL);
Tim Biegeleisen
  • 502,043
  • 27
  • 286
  • 360
  • I see. How would I actually format/create that string though (the first one) in C in order to do the insert? –  Sep 25 '19 at 02:23
  • 1
    @TagC198 I am not a C person, but I tried to give some code which might work with my suggested answer. – Tim Biegeleisen Sep 25 '19 at 02:31
  • I tried this approach but unfortunately it didn't give me any speed-up over the single insert. The best chunk size I was able to get was ~100, whereas larger chunks (500, 1000, 10k, etc.) made things even slower. –  Sep 25 '19 at 03:37
  • 1
    OK, honestly 3.39 seconds to insert 1 million records already sounds pretty fast to me. The advice I gave you is one suggestion you might receive here. In fact, many programming languages offer batch insert APIs for this purpose. – Tim Biegeleisen Sep 25 '19 at 03:42