-1

I have an HTTP POST endpoint in Flask which needs to insert whatever data comes in into a database. This endpoint can receive up to hundreds of requests per second. Doing an insert every time a new request comes takes too much time. I have thought that doing a bulk insert every 1000 request with all the previous 1000 request data should work like some sort of caching mechanism. I have tried saving 1000 incoming data objects into some collection and then doing a bulk insert once the array is 'full'.

Currently my code looks like this:

@app.route('/user', methods=['POST'])
def add_user():
    firstname = request.json['firstname']
    lastname = request.json['lastname']
    email = request.json['email']

    usr = User(firstname, lastname, email)

    global bulk
    bulk.append(usr)
    if len(bulk) > 1000:
        bulk = []
        db.session.bulk_save_objects(bulk)
        db.session.commit()

    return user_schema.jsonify(usr)

The problem I'm having with this is that the database becomes 'locked', and I really don't know if this is a good solution but just poorly implemented, or a stupid idea.

sqlalchemy.exc.OperationalError: (sqlite3.OperationalError) database is locked

3 Answers3

0

Your error message indicates that you are using an sqlite DB with SQLAlchemy. You may want to try changing the setting of the sqlite "synchronous" flag to turn syncing OFF. This can speed INSERT queries up dramatically, but it does come with the increased risk of data loss. See https://sqlite.org/pragma.html#pragma_synchronous for more details.

With synchronous OFF (0), SQLite continues without syncing as soon as it has handed data off to the operating system. If the application running SQLite crashes, the data will be safe, but the database might become corrupted if the operating system crashes or the computer loses power before that data has been written to the disk surface. On the other hand, commits can be orders of magnitude faster with synchronous OFF

If your application and use case can tolerate the increased risks, then disabling syncing may negate the need for bulk inserts.

See "How to set SQLite PRAGMA statements with SQLAlchemy": How to set SQLite PRAGMA statements with SQLAlchemy

zama
  • 106
  • 4
0

Once I moved the code on AWS and used the Aurora instance as the database, the problems went away, so I suppose it's safe to conclude that the issue were solely related to my sqlite3 instance.

The final solution gave me satisfactory results and I ended up changing only this line:

db.session.bulk_save_objects(bulk)

to this:

db.session.save_all(bulk)

I can now safely do up to 400 or more (haven't tested for more) calls on that specific endpoints, all ending with valid inserts, per second.

-1

Not an expert on this, but seems like database has reached its concurrency limits. You can try using Pony for better concurrency and transaction management

https://docs.ponyorm.org/transactions.html

By default Pony uses the optimistic concurrency control concept for increasing performance. With this concept, Pony doesn’t acquire locks on database rows. Instead it verifies that no other transaction has modified the data it has read or is trying to modify.

Shishir
  • 182
  • 3