I have lists of about 20,000 items that I want to insert into a table (with about 50,000 rows in it). Most of these items update certain fields in existing rows and a minority will insert entirely new rows.
I am accessing the database twice for each item. First is a select query that checks whether the row exists. Next I insert or update a row depending on the result of the select query. I commit each transaction right after the update/insert.
For the first few thousand entries, I am getting through about 3 or 4 items per second, then it starts to slow down. By the end it takes more than 1/2 second for each iteration. Why might it be slowing down?
My average times are: 0.5 seconds for an entire run divided up as .18s per select query and .31s per insert/update. The last 0.01 is due to a couple of unmeasured processes to do with parsing the data before entering into the database.
Update
I've commented out all the commits as a test and got no change, so that's not it (any more thoughts on optimal committing would be welcome, though).
As to table structure: Each row has twenty columns. The first four are TEXT fields (all set with the first insert) and the 16 are REAL fields, one of which is inputted with the initial insert statement.
Over time the 'outstanding' REAL fields will be populated with the process I'm trying to optimize here.
I don't have an explicit index, though one of the fields is unique key to each row.
I should note that as the database has gotten larger both the SELECT and UPDATE queries have taken more and more time, with a particularly remarkable deterioration in performance in the SELECT operation.
I initially thought this might be some kind of structural problem with SQLITE (whatever that means), but haven't been able to find any documentation anywhere that suggests there are natural limits to the program.
The database is about 60ish megs, now.