I'm using an SQLite database with peewee
on multiple machines, and I'm encountering various OperationalError
, DataBaseError
. It's obviously a problem of multithreading, but I'm not at all an expert with this nor with SQL. Here's my setup and what I've tried.
Settings
I'm using peewee
to log machine learning experiments. Basically, I have multiple nodes (like, different computers) which run a python file, and all write to the same base.db
file in a shared location. On top of that, I need a single read access from my laptop, to see what's going on. There are at most ~50 different nodes which instantiate the database and write things on it.
What I've tried
At first, I used the SQLite object:
db = pw.SqliteDatabase(None)
# ... Define tables Experiment and Epoch
def init_db(file_name: str):
db.init(file_name)
db.create_tables([Experiment, Epoch], safe=True)
db.close()
def train():
xp = Experiment.create(...)
# Do stuff
with db.atomic():
Epoch.bulk_create(...)
xp.save()
This worked fine, but I sometimes had jobs which crashed because of the database being locked. Then, I learnt that SQLite only handled one write operation per connection, which caused the problem.
So I turned to SqliteQueueDatabase
as, according to the documentation, it's useful if "if you want simple read and write access to a SQLite database from multiple threads." I also added those keywords I found on other thread which were said to be useful.
The code then looked like this:
db = SqliteQueueDatabase(None, autostart=False, pragmas=[('journal_mode', 'wal')],
use_gevent=False,)
def init_db(file_name: str):
db.init(file_name)
db.start()
db.create_tables([Experiment, Epoch], safe=True)
db.connect()
and the same for saving stuff except for the db.atomic
part. However, not only do write queries seem to encounter errors, I practically no longer have access to the database for read: it is almost always busy.
My question
What is the right object to use in this case? I thought SqliteQueueDatabase
was the perfect fit. Are pooled
database a better fit? I'm also asking this question because I don't know if I have a good grasp on the threading part: the fact that multiple database
object are initialized from multiple machines is different from having a single object on a single machine with multiple threads (like this situation). Right? Is there a good way to handle things then?
Sorry if this question is already answered in another place, and thanks for any help! Happy to provide more code if needed of course.