SQLite3 and Multiprocessing

Question

I noticed that sqlite3 isn´t really capable nor reliable when i use it inside a multiprocessing enviroment. Each process tries to write some data into the same database, so that a connection is used by multiple threads. I tried it with the check_same_thread=False option, but the number of insertions is pretty random: Sometimes it includes everything, sometimes not. Should I parallel-process only parts of the function (fetching data from the web), stack their outputs into a list and put them into the table all together or is there a reliable way to handle multi-connections with sqlite?

It'd be good to know what pragma settings you had on this db... — duhaime, Mar 28 '21 at 14:31

Ezekiel Kruglick · Answer 1 · 2014-03-05T21:04:57.987

I've actually just been working on something very similar:

multiple processes (for me a processing pool of 4 to 32 workers)
each process worker does some stuff that includes getting information from the web (a call to the Alchemy API for mine)
each process opens its own sqlite3 connection, all to a single file, and each process adds one entry before getting the next task off the stack

At first I thought I was seeing the same issue as you, then I traced it to overlapping and conflicting issues with retrieving the information from the web. Since I was right there I did some torture testing on sqlite and multiprocessing and found I could run MANY process workers, all connecting and adding to the same sqlite file without coordination and it was rock solid when I was just putting in test data.

So now I'm looking at your phrase "(fetching data from the web)" - perhaps you could try replacing that data fetching with some dummy data to ensure that it is really the sqlite3 connection causing you problems. At least in my tested case (running right now in another window) I found that multiple processes were able to all add through their own connection without issues but your description exactly matches the problem I'm having when two processes step on each other while going for the web API (very odd error actually) and sometimes don't get the expected data, which of course leaves an empty slot in the database. My eventual solution was to detect this failure within each worker and retry the web API call when it happened (could have been more elegant, but this was for a personal hack).

My apologies if this doesn't apply to your case, without code it's hard to know what you're facing, but the description makes me wonder if you might widen your considerations.

score 10 · Answer 2 · answered Aug 07 '11 at 00:35

10

First of all, there's a difference between multiprocessing (multiple processes) and multithreading (multiple threads within one process).

It seems that you're talking about multithreading here. There are a couple of caveats that you should be aware of when using SQLite in a multithreaded environment. The SQLite documentation mentions the following:

Do not use the same database connection at the same time in more than one thread.

On some operating systems, a database connection should always be used in the same thread in which it was originally created.

See here for a more detailed information: Is SQLite thread-safe?

answered Aug 07 '11 at 00:35

omz

53,243
5
129
141

4

and to add to that, sqlite works fine in a multi-process environment, as long as your aware that locking may cause some calls to time-out (fail), and that they then need to be re-tried. – andrew cooke Aug 07 '11 at 00:49
1

I know the thread/process -difference, and i use multiple processes (multiprocessing module with pools). I think the problems are the db-locks, but there is no Error-Message generated inside the processes (in general, my processes don´t care about errors, they just seem to pass them) ... – dorvak Aug 07 '11 at 12:42
You also have to be aware that if you're farming out the processes across multiple systems, you're going to have all the fun of file locking on networked systems. It's an area that is difficult; _really_ its better to keep the number of processes small enough to have on a single system. – Donal Fellows Aug 07 '11 at 17:18
@andrewcooke do you have an example of sqlite in a multi-process environment? Do you mean there's a way to send e.g. a cursor object to a child process? Or should child processes create their own cursors? – duhaime Mar 28 '21 at 14:31
URL changed to https://www2.sqlite.org/cvstrac/wiki?p=MultiThreading – Herbert Yu Aug 23 '23 at 14:50

score 2 · Accepted Answer · answered Aug 07 '11 at 16:53

2

sqlitedict: A lightweight wrapper around Python's sqlite3 database, with a dict-like interface and multi-thread access support.

answered Aug 07 '11 at 16:53

Radim

4,208
3
27
38

3

multi-thread support here means doesn't break your code as SQLite doesn't throw ProgrammingError Exception. However, from the lib doc -> "multithreaded support doesn’t give you any performance benefits." (https://pypi.python.org/pypi/sqlitedict ) – miguelfg Aug 28 '15 at 17:14

Michael Dillon · Answer 4 · 2011-08-07T16:44:39.747

If I had to build a system like the one you describe, using SQLITE, then I would start by writing an async server (using the asynchat module) to handle all of the SQLITE database access, and then I would write the other processes to use that server. When there is only one process accessing the db file directly, it can enforce a strict sequence of queries so that there is no danger of two processes stepping on each others toes. It is also faster than continually opening and closing the db.

In fact, I would also try to avoid maintaining sessions, in other words, I would try to write all the other processes so that every database transaction is independent. At minimum this would mean allowing a transaction to contain a list of SQL statements, not just one, and it might even require some if then capability so that you could SELECT a record, check that a field is equal to X, and only then, UPDATE that field. If your existing app is closing the database after every transaction, then you don't need to worry about sessions.

You might be able to use something like nosqlite http://code.google.com/p/nosqlite/

I didn´t get the second part...in my current setting each process connects the the database, executes an INSERT, commits it and closes the connection..did you mean that? — dorvak, Aug 07 '11 at 12:46

SQLite3 and Multiprocessing

4 Answers4

Linked