1

I've noticed that SQLite db.commit() (save to disk) takes between 50ms and 100ms with my small database. This is normal and documented here, but it's too much to do it after each request of a client / after each INSERT like this:

import bottle, sqlite3, random

@bottle.route('/')
def index():
    c = db.cursor()
    c.execute('INSERT INTO test VALUES (?)', (random.randint(0, 10000)))
    c.close()
    db.commit()    # should we do it here? (100ms is too much after *each* request!)
    return 'hello'

def worker():      
    db.commit()    # or here? how to call this worker once every 10 seconds? 

db = sqlite3.connect('test.db')
db.execute("CREATE TABLE IF NOT EXISTS test (a int)")
bottle.run(port=80)

More precisely I don't want to lose 100ms after each request: I optimized my server to serve pages very fast (10ms), and it would be a shame to lose 100ms because of DB commit.

Of course I could start a new thread to call worker() only once every 10 seconds (in the worst case, if the app crashes, only the 10 last seconds of DB will be lost). But I read it's not recommended to use threads in this context.

Question: how to do a SQLite DB commit once every few seconds only (instead of once after each INSERT) in a Bottle/Flask webserver context?

davidism
  • 121,510
  • 29
  • 395
  • 339
Basj
  • 41,386
  • 99
  • 383
  • 673
  • `I don't want to lose 100ms after each request` - Why not? Very few domains are really that latency sensitive - especially on write. Is your webapp one of those exceedingly rare cases? (High frequency trading, real-time bidding, etc.) – ron rothman May 05 '20 at 16:22
  • @ronrothman I'm in a situation in which many clients might connect exactly at the same time (for a precise event), I know it's usually not the case, usually the users are spread along 24 hours per day, but here they come for a specific event, with a precise starting time. – Basj May 05 '20 at 16:28
  • 1
    Then (if I understand you correctly) you need high throughput, not low latency. You should be thinking about increasing concurrency, not reducing latency. – ron rothman May 05 '20 at 16:32
  • @ronrothman Both would be great: not losing 100ms on each request for a commit to disk (whereas the (complex!) page is generated in less than 20ms or 30ms), and also increase concurrency. Multiplying the time for each request by a factor x6 just to log something is something we don't want, I so wanted to analyze how to avoid this. (Wouldn't be a sad that the page generation is done in a few milliseconds, and that just the small DB logging commit takes 6 times more?) – Basj May 05 '20 at 16:42
  • I guess some things have to be learned the hard way. :) I highly recommend that you stop worrying about the 100ms. Good luck! – ron rothman May 05 '20 at 16:44
  • @ronrothman I did not say I won't increase concurrency. I will do it. But I'm also trying to understand if having a small DB logging eating 6x more time than the actual page generation/serving can be avoided. I think it's sane to study this, don't you think so? – Basj May 05 '20 at 16:45
  • IMO the only reasons you should be spending time on the 100ms are (1) you have a `requirement` of very low latency, or (2) you're having fun working on optimizing your webapp. Other than that, no. – ron rothman May 05 '20 at 16:49
  • @ronrothman Sure for (2), I admit! – Basj May 05 '20 at 17:14

2 Answers2

0

You could write a Flask after_request() handler and commit the database there. You may even restrict to do this only if a certain amount of time has elapsed since the last commit.

This won't block your requests and will not save the file too often.

Heap Underflow
  • 377
  • 1
  • 12
  • Interesting method, but it has a few drawbacks. Example: request #100 (DB commit), request #101 (no DB commit because DB commit has already been done 1 second before), then wait 30 seconds, and server is stopped (`app.py` is killed or stopped with CTRL+C). Then the INSERT of request #101 is ... lost! Any idea? – Basj May 05 '20 at 07:58
-1

Here is an attempt. It only commits once every 10 second, if there is an INSERT during this timeframe.

NB: it requires check_same_thread=False, as explained in ProgrammingError: SQLite objects created in a thread can only be used in that same thread.

import bottle, sqlite3, random, threading, time

@bottle.route('/')
def index():
    global committhread
    c = db.cursor()
    c.execute('INSERT INTO test VALUES (?)', (random.randint(0, 10000),))
    c.close()
    if not committhread: 
        print('Calling commit()...')
        committhread = threading.Thread(target=commit)
        committhread.start()
    else:
        print('A commit is already planned.')
    return 'hello'

def commit():
    global committhread
    print("We'll commit in 10 seconds.")
    time.sleep(10)  # I hope this doesn't block/waste CPU here?
    db.commit()
    print('Committed.')
    committhread = None    

db = sqlite3.connect('test.db', check_same_thread=False)
db.execute("CREATE TABLE IF NOT EXISTS test (a int)")
committhread = None
bottle.run(port=80)

But as discussed in SQLite and Python: commit once every 10 seconds maximum, and not after every client request there are some other options worth trying before doing the previous code, such as:

c.execute('PRAGMA synchronous = OFF')
c.execute('PRAGMA journal_mode = OFF')

This improves performance by at least an order of magnitude.

Useful resources:

Basj
  • 41,386
  • 99
  • 383
  • 673
  • 3
    I totally accept the downvote (downvotes are useful when an answer is bad); if the downvoter could leave a comment about what is wrong, it would be super useful/heplful, and it would help me. Thanks in advance! – Basj May 05 '20 at 16:30