When parsing huge log files into a sqlite database is there any major performance improve if we don't call connection.commit()
after each insert or does that not make any difference. I guess the question is if it's only for separating transactions or if it does anything that needs time.
Asked
Active
Viewed 1,065 times
1

noob
- 8,982
- 4
- 37
- 65
1 Answers
3
Yes, performance decreases directly with more frequent commits.
Here's a toy example working with a "lorem ipsum" text file of about 1400 lines:
import argparse
import sqlite3
import textwrap
import time
parser = argparse.ArgumentParser()
parser.add_argument("n", help="num lines per commit", type=int)
arg = parser.parse_args()
con = sqlite3.connect('lorem.db')
cur = con.cursor()
cur.execute('drop table if exists Lorem')
cur.execute('create table Lorem (lorem STRING)')
con.commit()
with open('lorem.txt') as f: lorem=textwrap.wrap(f.read())
print('{} lines'.format(len(lorem)))
start = time.time()
for i, line in enumerate(lorem):
cur.execute('INSERT INTO Lorem(lorem) VALUES(?)', (line,))
if i % arg.n == 0: con.commit()
stend = time.time()
print('{} lines/commit: {:.2f}'.format(arg.n, stend-start))
With this saved as sq.py
...:
$ for i in `seq 1 10`; do python sq.py $i; done
1413 lines
1 lines/commit: 1.01
1413 lines
2 lines/commit: 0.53
1413 lines
3 lines/commit: 0.35
1413 lines
4 lines/commit: 0.27
1413 lines
5 lines/commit: 0.21
1413 lines
6 lines/commit: 0.19
1413 lines
7 lines/commit: 0.17
1413 lines
8 lines/commit: 0.14
1413 lines
9 lines/commit: 0.13
1413 lines
10 lines/commit: 0.11
So, halving the commit
s nearly halves elapsed time for the operation, and so on -- it's not quite linear but almost so.
For completeness: 100 lines per commit reduce runtime to 0.02.
Riffing on this, you can easily experiment, and measure times, with DB tables closer to what you actually require.

Alex Martelli
- 854,459
- 170
- 1,222
- 1,395
-
as you wish tough one part of the "multiple" questions could be closed because of this reason: `Questions asking us to recommend or find a book, tool, software library, tutorial or other off-site resource` and the other one was related to sqlite. – noob Feb 09 '15 at 01:16
-
1@mash, I see your point, and, weirdly, the software recs stack exchange site appears to NOT want Qs asking for SW recs (?!). The solution in this case is to frame the other Q around specific performance measurements in a sample of log-parsing tasks and ask about how to speed up those tasks -- whether it be by fine-tuning or differently using that one library, coding Python directly, using other libraries, whatever, this is very much a SO question, not asking for books, tools, software libraries, tutorials, &c. – Alex Martelli Feb 09 '15 at 01:20