0

(See update at bottom of question for more to-the-point info.)


My code is constructing a SQL bulk INSERT OR REPLACE query with 500 rows (4 int32 and 2 int64 fields, for a total of 6 columns). From here and from experience, I know that the maximum limit for bulk insert of rows in SQLite is 500 rows.

A shortened example of the bulk insert query that my program is constructing, but with only 2 rows, is:

INSERT OR REPLACE INTO "VG_INSTANCE_DATA_B" (year, yield, ccode, month, DATETIME_ROW_START, DATETIME_ROW_END) VALUES (1800, 6.9400000000000004, 2, 1, -5364662400000, -5333126400000), (1800, 6.9400000000000004, 2, 2, -5364662400000, -5333126400000)

The query is prepared in SQLite via sqlite3_prepare_v2().

When the program builds & runs in Visual Studio 2013's debugger, and when there are 500 inserts in the bulk insert statement, the sqlite3_prepare_v2() function crashes with a stack overflow, with internal SQLite functions being called over and over in a cycle.

When I decrease the number of bulk-inserted rows to 400, however, the stack overflow condition disappears.

I attempted to increase the stack size by a factor of 10 by passing /F 10000000 to the compiler and /STACK 10000000 to the linker. However, it had no effect. I suspect that something else is going on - and that my stack is not really overflowing (it is hard for me to see how 500 rows at 20 bytes or so per row could cause a stack overflow just due to the amount of data, and as noted, there shouldn't be a stack overflow due to the internal function calls in SQLite, because otherwise SQLite would not claim to support 500 inserts in a bulk insert statement).

Can someone please explain why there is a stack overflow crash during the call to sqlite3_prepare_v2() for a bulk insert of 500 rows (each containing only 6 reasonably-sized columns), when the official SQLite limit for a bulk insert is 500? (Note: I also tested 499, and it still crashed with a stack overflow exception. It is known not to crash at 400, though.)

Thanks!


Update: I have debugged into sqlite3.c and by using here and here, actually added code to calculate the stack usage, and placed a breakpoint at the point where the stack overflow occurs.

Two comments.

  • SQLite does a RECURSIVE call for bulk inserts. I.e., 500 rows in the bulk insert = 500 recursions into the set of functions involved, bringing the stack deeper and deeper. (I do not know if the Release-mode compiler optimizes away the recursion, or not.)

  • The stack is legitimately reaching the default 1 MB limit in Debug mode. The crash definitely occurs at the single function call that brings the stack size over 1 MB (I know this by using the above-linked code to calculate the stack usage at the point of the stack overflow).


Addendum 2

Important info: my bulk query is actually INSERT OR REPLACE, and the stack overflow crash occurs in a recursive SELECT query. This SELECT query is related to the REPLACE part of my bulk INSERT statement. Therefore, I think it's likely that the relevant cause of the problem is the REPLACE, not the INSERT. (See comments below this question for some more details, including timing.)

I have updated the title to reflect this.

Community
  • 1
  • 1
Dan Nissenbaum
  • 13,558
  • 21
  • 105
  • 181
  • 1
    Consider preparing a parameterized query that inserts a single row, then running it 500 times, with different values bound to parameters. The way you do it adds significant overhead: first you have to serialize all your values into one huge string, then SQLite has to parse them right back. – Igor Tandetnik Mar 22 '14 at 13:49
  • Works for me. Which SQLite version? (Could you update?) Which functions loop? – CL. Mar 22 '14 at 14:48
  • @IgorTandetnik - I have programmed it *both* ways - (1) with a prepared statement and parameterized queries, and (2) with bulk `insert` statement, and the latter appears *significantly* faster. – Dan Nissenbaum Mar 22 '14 at 18:42
  • @DanNissenbaum: Make sure you enclose all runs of single-row insert in an explicit transaction: run `BEGIN` before, and `COMMIT` after. Otherwise, every individual row is inserted in its own implicit transaction, and committing a transaction is expensive with SQLite (it ensures that the data is physically written to disk surface). – Igor Tandetnik Mar 22 '14 at 19:14
  • @IgorTandetnik - I have also done that, in fact. I wrap all 500 (or 400) inserts with the prepared statement inside a single explicit transaction. – Dan Nissenbaum Mar 22 '14 at 19:32
  • I have discovered that the stack overflow is legitimate. SQLite internally performs a *recursion* when implementing a bulk insert, rather than a loop; and in fact with 500 rows it is easy for the default 1 MB stack size on Windows to be reached. I have updated the question with this info. – Dan Nissenbaum Mar 22 '14 at 23:02
  • I have posted another question about how to increase the stack size in Qt: http://stackoverflow.com/questions/22585073/how-to-increase-stack-size-for-a-thread-in-qt-qthreadsetstacksize-seems-no – Dan Nissenbaum Mar 22 '14 at 23:35
  • @IgorTandetnik - My timing results are in. **(1) Bulk insert** (500 rows in one insert statement): 200,000 rows in ~15 seconds **(2) A single prepared statement** (a loop over 1,000 rows at a time, each loop wrapped in a transaction): ~1.5 hours **(3) A single prepared statement** with a single transaction over all 200,000 rows: ~1.5? hours, all time spent committing the transaction **(4) A single prepared statement** for all rows, with no transaction: ~1.5? hours **(5) No prep. statement**, transaction every 500 rows: many hours. I'm surprised (1) is so much faster. I hope to post a question. – Dan Nissenbaum Mar 23 '14 at 09:48
  • @IgorTandetnik - Important addendum: My queries are actually `INSERT OR REPLACE` (for the bulk version) and `UPDATE` followed by `INSERT` (if the update acted on 0 rows) (for the prepared statement version). This might explain my numbers, as SQLite3 does spend "a lot" of time running a `SELECT` to handle the bulk call, and presumably must also do this in the prepared statement version - perhaps this eats up the vast majority of the time. – Dan Nissenbaum Mar 23 '14 at 10:07
  • Must be something about your database schema and/or the exact queries you run. In my experience, just plain inserting 200,000 rows would take a couple of minutes, if that - certainly not hours. In any case, I suggest you post to [`sqlite-users` mailing list](http://sqlite.org:8080/cgi-bin/mailman/listinfo/sqlite-users). Many SQLite developers, including Dr. Hipp, are hanging out there. – Igor Tandetnik Mar 23 '14 at 14:31
  • @IgorTandetnik - I have, I think, conclusively shown that it is the `UPDATE` part of my "first attempt `UPDATE`, then `INSERT`" that is causing nearly 100% of the time. I have confirmed by simply removing the `UPDATE` part, and then the query with prepared statement runs comparably to the bulk insert (though it might be faster - I haven't timed it directly in comparison). The "key" fields are NOT indexed in my case which easily explains the time spent by the `UPDATE` portion. So it is the `UPDATE`, not the `INSERT`, that is the source of the issue. Thanks! – Dan Nissenbaum Mar 23 '14 at 20:45

0 Answers0