4

Execute many seems to be very slow with deletion (Insertion is fine) and I was wondering if anyone knows why it takes so long.

Consider the code below:

import sqlite3

db = sqlite3.connect("mydb")
c = db.cursor()
c.execute("DROP TABLE IF EXISTS testing ")
c.execute("CREATE TABLE testing (val INTEGER);")
my_vals2 = [[x] for x in range(1,10000)]

def insertmany(vals):
    c.executemany("INSERT INTO testing (val) VALUES (?)",vals)
    db.commit()

def deletemany1(vals):
    c.executemany("DELETE FROM testing WHERE val=?",vals)
    db.commit()

def deletemany2(vals): #this is fastest even though im looping over to convert to strings and again to join ...
    vals = ["'%s'"%v[0] for v in vals] 
    c.execute("DELETE FROM testing WHERE val IN (%s)"%",".join(vals))
    #DELETE FROM TABLE WHERE x in (1,2,3...)

And the following time results (timeit was giving funny data so :/) from IPython:

%time insertmany(my_vals2) 
#CPU times: user 0.60 s, sys: 0.00 s, total: 0.60 s Wall time: 0.60 s

%time deletemany1(my_vals2)
#CPU times: user 3.58 s, sys: 0.00 s, total: 3.58 s Wall time: 3.58 s

%time deletemany2(my_vals2)
#CPU times: user 0.02 s, sys: 0.00 s, total: 0.02 s Wall time: 0.02 s

And just for the sake of completeness here are the timeit results (but I think timeit is broken on the second test(that or the ms is a different unit then the first test))

%timeit insertmany(my_vals2) 
#1 loops, best of 3: 358 ms per loop

%timeit deletemany1(my_vals2)
#1 loops, best of 3: 8.34 ms per loop  <- this is not faster than the above!!!! (timeit lies?)

%timeit deletemany2(my_vals2)
#100 loops, best of 3: 2.3 ms per loop  

So why is executemany soooooo slow with deletes?

Patrick Yoder
  • 1,065
  • 4
  • 14
  • 19
Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
  • it is quite interesting that the code-generated single delete statement in `deletemany1` is faster than `deletemany2`, which uses `executemany()` provided by python itself. – HCSF Aug 17 '20 at 13:47

2 Answers2

2

I'm just taking a punt: Because it has to search exhaustively for the ones to delete. Try it with an index and report back.

CREATE INDEX foo ON testing (val)

http://sqlite.org/lang_createindex.html

John Mee
  • 50,179
  • 34
  • 152
  • 186
  • well thats much faster on deletion but much slower on insertion now ...(+1 anyway I didnt know sqlite supported indexing) – Joran Beasley Nov 21 '12 at 01:09
2

SQLites stores table records in a B+ tree, sorted by rowid.

When you are inserting with an automatically generated rowid, all records are just appended at the end of the table. However, when deleting, SQLite has to search for the record first. This is slow if the id column is not indexed; either create an explicit index (as proposed by John), or declare the column as INTEGER PRIMARY KEY to make it the rowid.

Inserting with an index becomes faster if you don't use the index, i.e., if you create the index only after bulk inserts.

Your last delete command deletes all records at once. If you know that you're deleting all records in the table, you could speed it up even further by using just DELETE FROM testing, which doesn't need to look at any records at all.

CL.
  • 173,858
  • 17
  • 217
  • 259
  • in this case it is deleting all ... but in the real world it is user selected ... anyway between the two of you I think I have a slightly better idea of this behavior ... – Joran Beasley Nov 21 '12 at 15:14