3

I have lists of about 20,000 items that I want to insert into a table (with about 50,000 rows in it). Most of these items update certain fields in existing rows and a minority will insert entirely new rows.

I am accessing the database twice for each item. First is a select query that checks whether the row exists. Next I insert or update a row depending on the result of the select query. I commit each transaction right after the update/insert.

For the first few thousand entries, I am getting through about 3 or 4 items per second, then it starts to slow down. By the end it takes more than 1/2 second for each iteration. Why might it be slowing down?

My average times are: 0.5 seconds for an entire run divided up as .18s per select query and .31s per insert/update. The last 0.01 is due to a couple of unmeasured processes to do with parsing the data before entering into the database.

Update

I've commented out all the commits as a test and got no change, so that's not it (any more thoughts on optimal committing would be welcome, though).

As to table structure: Each row has twenty columns. The first four are TEXT fields (all set with the first insert) and the 16 are REAL fields, one of which is inputted with the initial insert statement.

Over time the 'outstanding' REAL fields will be populated with the process I'm trying to optimize here.

I don't have an explicit index, though one of the fields is unique key to each row.

I should note that as the database has gotten larger both the SELECT and UPDATE queries have taken more and more time, with a particularly remarkable deterioration in performance in the SELECT operation.

I initially thought this might be some kind of structural problem with SQLITE (whatever that means), but haven't been able to find any documentation anywhere that suggests there are natural limits to the program.

The database is about 60ish megs, now.

tapertaper
  • 47
  • 1
  • 7
  • 1
    You can tweak SQLite using `PRAGMAs`, journaling etc. – ba__friend Jul 15 '11 at 18:12
  • Those times are "a priori" horrible, but we'd need more context to really have an idea. Can you post the structure of the table, and some a sample of some typical records? –  Jul 15 '11 at 19:06
  • Did you give WAL journaling a try? –  Aug 04 '11 at 19:52
  • You know I actually haven't. Your answer made me realize that I didn't have an explicit Primary Key, so I rebuilt the table with a primary key. This helped the speed a lot, but I really won when I grouped the inserts and updates to use 'executemany' instead of 'execute'. Now the whole thing takes about 30 seconds instead of 2-3 hours to run. – tapertaper Aug 07 '11 at 01:49

2 Answers2

5

I think your bottleneck is that you commit with/avec each insert/update:

I commit each transaction right after the update/insert.

Either stop doing that, or at least switch to WAL journaling; see this answer of mine for why: SQL Server CE 4.0 performance comparison

If you have a primary key you can optimize out the select by using the ON CONFLICT clause with INSERT INTO:

http://www.sqlite.org/lang_conflict.html

EDIT : Earlier I meant to write "if you have a primary key " rather than foreign key; I fixed it.

Community
  • 1
  • 1
2

Edit: shame on me. I misread the question and somehow understood this was for mySQL rather that SQLite... Oops.
Please disregard this response, other than to get generic ideas about upating DBMSes. The likely solution to the OP's problem is with the overly frequent commits, as pointed in sixfeetsix' response.


A plausible explanation is that the table gets fragmented.
You can verify this fact by defragmenting the table every so often, and checking if the performance returns to the 3 or 4 items per seconds rate. (Which BTW, is a priori relatively slow, but then may depend on hardware, data schema and other specifics.) Of course, you'll need to consider the amount of time defragmentation takes, and balance this against the time lost by slow update rate to find an optimal frequency for the defragmentation.

If the slowdown is effectively caused, at least in part, by fragmentation, you may also look into performing the updates in a particular order. It is hard to be more specific without knowing details of the schema of of the overall and data statistical profile, but fragmentation is indeed sensitive to the order in which various changes to the database take place.

A final suggestion, to boost the overall update performance, is (if this is possible) drop a few indexes on the table, perform the updates, and recreate the indexes anew. This counter-intuitive approach works for relative big updates because the cost for re-creating new indexes is often less that the cumulative cost for maintaining them as the update progresses.

Community
  • 1
  • 1
mjv
  • 73,152
  • 14
  • 113
  • 156
  • Why does this get upvoted? OP asks about sqlite, and you throw a mysql feature at him. –  Jul 15 '11 at 19:38
  • @sixfeetsix: Oh my! You are right... ' must have misread the tag/question. Also SQLite explains this 3-4 recs per second. Your response is much more plausible, I'll + it at once. Take heart, your resp. will soon perk up to the top I hope. – mjv Jul 15 '11 at 19:56
  • 1
    The upvotes are a mystery yet, though :) but this answer is still useful if you remember of [VACUUM](http://www.sqlite.org/lang_vacuum.html). – brandizzi Jul 15 '11 at 20:42