0

Insertion of approximately 110 million records in a table in database is taking huge time (>1hour) using C++ interface APIs.

Is there any way to bring this time down and improve efficiency of insert operation ?

I am grouping 1000 records in one transaction and then executing them.

sqlite3_exec(begin transaction);

sqlite3_exec(<1000> insertions);

sqlite3_exec(end transaction);

This is taking huge time.

How to improve efficiency of insertion?

Arne Vogel
  • 6,346
  • 2
  • 18
  • 31
mehak
  • 57
  • 7

1 Answers1

2

Assuming all your statements are just INSERTs into the same table, you'll probably be able to improve performance by creating the statement as a prepared statement, and then reusing it for each row (binding it to new data each time). Some explanation of prepared statements is here; the basic idea is, rather than do sqlite3_exec, you prepare a statement with sqlite3_prepare_v2 (with placeholders instead of the data), then for each row bind it to the data, execute it, and reset it. That way, SQLite doesn't have to reparse/recompile the statement each time, because only the values change.

Sneftel
  • 40,271
  • 12
  • 71
  • 104
  • `sqlite3_exec` already does a prepare, i.e. there'll be one prepare for every 1000 insertions. This could be improved, but it's likely to improve throughput by less than 1%. – Arne Vogel Sep 10 '18 at 12:00
  • 3
    @ArneVogel This is, exactly, the point of this answer. Instead of doing a prepare for each `sqlite3_exec`, you do a single prepare for the entire batch. How much time would be saved by this technique can only be found out by measuring, and there's no use of speculating on what "might be" the performance improvements. – Algirdas Preidžius Sep 10 '18 at 12:08
  • @AlgirdasPreidžius Measuring is good, but experience and speculation are separate issues. Consider e.g. my [close as duplicate suggestion](https://stackoverflow.com/questions/1711631/improve-insert-per-second-performance-of-sqlite) where using a prepared statement more than doubled insertion throughput – but that was compared to *single* insert calls to `sqlite3_exec`. Assuming that costs are additive, this extrapolates to an expected gain of just ~0.13% here. Hence I consider explicit prepare a micro-optimization. Linked results suggest that reducing number of transactions is #1 priority. – Arne Vogel Sep 10 '18 at 12:31
  • 1
    @ArneVogel My initial reading of the question was that the OP was making 1000 calls to `sqlite3_exec` per transaction. It's possible that they're doing just one, with a *lot* of value tuples, but in that case the cost to parse/compile the statement will be correspondingly greater. – Sneftel Sep 10 '18 at 12:36
  • @ArneVogel "_where using a prepared statement more than doubled insertion throughput_" Doubled? Using `sqlite3_exec` for single rows did "85 inserts per second.", while using prepared statements did "_53,000 inserts per second_", which is in the area of 600 times the improvement. The double improvement is compared to putting the entirety of inserts in a transaction. That is not what is done in the code shown. – Algirdas Preidžius Sep 10 '18 at 12:58
  • @Sneftel I assumed that intended use of `sqlite3_exec` for bulk op is with placeholders and a value-providing callback. However, on reading the documentation again, I see that the callback is only for processing outputs, not inputs. I should not be assuming that C interfaces are more useful than they actually are. I hence retract my claim that using explicit prepare is a micro-optimization. – Arne Vogel Sep 10 '18 at 13:06
  • @AlgirdasPreidžius 85 operations per second is for "Worst-Case-Scenario" where *every* insert is in a separate transaction. Single transaction, but one `sqlite3_exec` per row is 23,000 ops. Single transaction, single prepare is 53,000 ops. OP is doing one transaction every 1,000 rows, which is probably not optimal but yet very different from "Worst-Case-Scenario". – Arne Vogel Sep 10 '18 at 13:11
  • @Sneftel Thanks. It solved my problem and insertion is quite fast. – mehak Sep 11 '18 at 03:46