3

I have found the following issue with APSW (an SQLite parser for Python) when inserting lines.

Lets say my data is data = [[1,2],[3,4]]

APSW and SQLite3 allow me to do something like:

apsw.executemany("INSERT INTO Table VALUES(?,?)", b)

or I can write some code that does the following:

sql = "BEGIN TRANSACTION; 
INSERT INTO Table Values('1','2');
INERT INTO Table Values('3','4');
COMMINT;"

apsw.execute(sql)

When data is a long list/array/table the performance of the first method is extremelly slow compared to the second one (for 400 rows it can be 20 sec vs less than 1!). I do not understand why this is as that is the method shown on all SQLite Python tutorials to add data into a table.

Any idea of what may be happening here?

Nallath
  • 2,100
  • 20
  • 37
Yona
  • 571
  • 7
  • 23
  • 1
    Transactions are quicker than just executing sql. You can do the same thing with first method by adding `aspw.execute("BEGIN TRANSACTION")` before you `executemany` and `aspw.execute("COMMIT")` when you're done – Confuseh Jan 26 '16 at 12:18
  • @Confuseh Actually, many transactions [are slower](https://www.sqlite.org/faq.html#q19). – CL. Jan 26 '16 at 12:54
  • @CL. Many transactions are slower, indeed, but one transaction and a lot of inserts in transaction is still quicker than inserts without transaction. – Confuseh Jan 26 '16 at 16:16
  • Confuseh CL thank you both for the replies and interesting links. Hope this thread may help people with similar problems in the future – Yona Jan 26 '16 at 17:36
  • 1
    @Confuseh It is not possible to have inserts without transactions; without explicit `BEGIN`/`COMMIT`, you get one [automatic transaction](http://www.sqlite.org/lang_transaction.html) for each statement. – CL. Jan 26 '16 at 19:36
  • @CL. I phrased it wrong, but that's what I meant. Sorry for misunderstanding. – Confuseh Jan 27 '16 at 00:44

4 Answers4

9

(Disclosure: I am the author of APSW). If you do not explicitly have a transaction in effect, then SQLite automatically starts one at the beginning of each statement, and ends at the end of each statement. A write transaction is durable - meaning the contents must end up on storage and fsync called to ensure they will survive an unexpected power or system failure. Storage is slow!

I recommend using with rather than BEGIN/COMMIT in your case, because it will automatically rollback on error. That makes sure your data insertion either completely happens or not at all. See the documentation for an example.

When you are inserting a lot of data, you will find WAL mode to be more performant.

Roger Binns
  • 3,203
  • 1
  • 24
  • 33
  • 1
    Hello Roger, thanks so much for your detailed explanation! It is not everyday one has the developer behind the tool you use every day write on your StackOverflow post!!!!! – Yona Feb 08 '16 at 09:44
8

Thanks to Confuseh I got the following answer:

Executing:

apsw.execute("BEGIN TRANSACTION;")
apsw.executemany("INERT INTO Table VALUES(?,?)", b)
apsw.execute("COMMIT;")

Speeds up the process by A LOT! This seems to be the right way of adding data (vs using my method of creating multiple INSERT statments).

Yona
  • 571
  • 7
  • 23
1

Thank you for this question, the answer help me when I use Sqlite with Python. finally, I get the following things, and wish it can help some people :

When connct to the sqlite database we can use

con = sqlite3.connect(":memory:",isolation_level=None) or con = sqlite3.connect(":memory:")


when use isolation_level=None, it will use autocommit mode that make too many transaction , and become too slow. this will help:

cur.execute("BEGIN TRANSACTION")
cur.executemany(....)
cur.execute("COMMIT")

And if use con = sqlite3.connect(":memory:"), cur.executemany(....) will be fast immediately.

tinyhare
  • 2,271
  • 21
  • 25
1

Problem

There may be a confusion for mysqlclient-python/pymysql users who expect executemany of sqlite3/apsw to rewrite their INERT INTO table VALUES(?, ?) into a multi-row INSERT statement.

For instance, executemany of mysqlclient-python has this in its docstring:

This method improves performance on multiple-row INSERT and REPLACE. Otherwise it is equivalent to looping over args with execute().

Python stdlib's sqlite3.Cursor.executemany doesn't have this optimisation. It's always loop-equivalet. Here's how to demonstrate it (unless you want to read some C, _pysqlite_query_execute):

import sqlite3

conn = sqlite3.connect(':memory:', isolation_level=None)
conn.set_trace_callback(print)
conn.execute('CREATE TABLE tbl (x INTEGER, y INTEGER)')

conn.executemany('INSERT INTO tbl VALUES(?, ?)', [(i, i ** 2) for i in range(5)])

It prints:

CREATE TABLE tbl (x INTEGER, y INTEGER)
INSERT INTO tbl VALUES(0, 0)
INSERT INTO tbl VALUES(1, 1)
INSERT INTO tbl VALUES(2, 4)
INSERT INTO tbl VALUES(3, 9)
INSERT INTO tbl VALUES(4, 16)

Solution

Thus, you either need to rewrite these INSERTs into multi-row one (manually or, for instance, with python-sql) to stay in auto-commit mode (isolation_level=None), or wrap your INSERTs in a transaction (with sensible number of INSERTs in one) in default implicit-commit mode. The latter means the following for the above snippet:

import sqlite3

conn = sqlite3.connect(':memory:')
conn.set_trace_callback(print)
conn.execute('CREATE TABLE tbl (x INTEGER, y INTEGER)')

with conn:
    conn.executemany('INSERT INTO tbl VALUES(?, ?)', [(i, i ** 2) for i in range(5)])

Now it prints:

CREATE TABLE tbl (x INTEGER, y INTEGER)
BEGIN 
INSERT INTO tbl VALUES(0, 0)
INSERT INTO tbl VALUES(1, 1)
INSERT INTO tbl VALUES(2, 4)
INSERT INTO tbl VALUES(3, 9)
INSERT INTO tbl VALUES(4, 16)
COMMIT

For further bulk-insert performance improvement in SQLite, I'd suggest to start with this overview question.

saaj
  • 23,253
  • 3
  • 104
  • 105