1

I'm facing an issue with deleting thousands of rows from a single table takes a long time (over 2 minutes for 14k records) while inserting the same records is nearly instant (200ms). Both the insert and delete statements are handled identically - a loop generates the statements and appends them to a list, then the list is passed to a separate function that opens a transaction, executes all the statements and then finishes with commit. At least this was my impression before I started testing this with pseudocode - but it looks like I misunderstood the need for opening the transaction manually.

I've read about transactions (https://www.sqlite.org/faq.html#q19) but since the inserts are pretty much instant then I am not sure whether this is the case here.

My understanding is that transaction == commit, and if this is correct then it looks like all the delete statements are in a single transaction - mid-processing I can see all the deleted rows until the final commit, after which they are actually deleted. Ie the situation in the FAQ link below should be different - since no commit takes place. But the slow speed indicates that it is still doing something else that is slowing things down as if each delete statement were a separate transaction.

After running the pseudocode it appears that while the changes are not committed until explicit commit is sent (via conn.commit()) but the "begin" or "begin transaction" in front of the loop does not have any effect. I think this is because sqlite3 sends the "begin" automatically in the background ( Merge SQLite files into one db file, and 'begin/commit' question )

Pseudocode to test this out:

import sqlite3
from datetime import datetime
insert_queries = []
delete_queries = []
rows = 30000
for i in range(rows):
    insert_queries.append(f'''INSERT INTO test_table ("column1") VALUES ("{i}");''')
for i in range(rows):
    delete_queries.append(f'''DELETE from test_table where column1 ="{i}";''')

conn = sqlite3.connect('/data/test.db', check_same_thread=False)
timestamp = datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%S")
print('*'*50)
print(f'Starting inserts: {timestamp}')
# conn.execute('BEGIN TRANSACTION')
for query in insert_queries:
    conn.execute(query)
conn.commit()
timestamp = datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%S")
print(f'Finished inserts: {timestamp}')

print('*'*50)
timestamp = datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%S")
print(f'Starting deletes: {timestamp}')
# conn.isolation_level = None
# conn.execute('BEGIN;')
# conn.execute('BEGIN TRANSACTION;')
for query in delete_queries:
    conn.execute(query)
conn.commit()
timestamp = datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%S")
print(f'Finished deletes: {timestamp}')

One weird thing is that the row count exponentially increases the delete time (2s to delete 10k rows, 7s to delete 20k rows, 43s to delete 50k rows) while the insert time is instant regardless of row count.

EDIT: The original question was - why does the delete statement take so much more time compared to insert statement and how to speed it up so the speeds for both inserting and deleting rows would be similar.

As per snakecharmerb's suggestion one workaround around this would be to do it like this:

rows = 100000
delete_ids = ''
for i in range(rows):
    if delete_ids:
        delete_ids += f',"{i}"'
    else:
        delete_ids += f'"{i}"'
delete_str = f'''DELETE from test_table where column1 IN ({delete_ids});'''
conn.execute(delete_str)
conn.commit()

While this is most likely against all best-practices, it does seem to work - it takes around 2 seconds to delete 1mil rows.

Sander
  • 11
  • 2
  • 1) Why are the delete statements taking so much more time compared to insert statements? 2) How to speed up the delete process? – Sander Jun 30 '21 at 10:21
  • I don't know why they take longer (on my machine your delete script took 6s, the insert 0s). To speed them you might try: (1) `DELETE FROM tbl WHERE id IN (1,2,3...)` There's probably an optimum number of ids per statement (for Postgres it's about 80 I think; don't know for sqlite) (2) `cursor.executemany` (3) intermediate commits (say after each 1000 rows deleted) – snakecharmerb Jun 30 '21 at 11:15
  • Yeah, as I've mentioned in the original post it goes up exponentially if you increase the row count on line 5. While still on the slow side I would even live with the sub 10s time to delete 30k records, but on the live system it takes around 2 minutes to delete 14k rows which is not acceptable. I will test out the IN(1,2,3..) syntax along with intermediate commits, will update this post if they help. – Sander Jun 30 '21 at 11:22

1 Answers1

1

I tried batching the deletes in sets of 50:

...
batches = []
batch = []
for i in range(rows):
    batch.append(str(i))
    if len(batch) == 50:
        batches.append(batch)
        batch = []
if batch:
    batches.append(batch)
...

base = 'DELETE FROM test_table WHERE column1 IN ({})'
for batch in batches:
    placeholders = ','.join(['?'] * len(batch))
    sql = base.format(placeholders)
    conn.execute(sql, batch)
conn.commit()
...

and this reduced the duration to 1 - 2 seconds (from 6 - 8 originally).

Combining this approach with executemany resulted in a 1 second duration.

Using a query to define the deleted columns was almost instant

DELETE FROM test_table WHERE column1 IN (SELECT column1 FROM test_table)

but it's possible Sqlite recognises that this query is the same as a bare DELETE FROM test_table and optimises.

Switching off the secure_delete PRAGMA seemed to make performance even worse.

snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
  • It seems the `IN ()` syntax is the right way to go, at least as far as the workarounds go. Though it is still bugging me why "identical" approach of inserting the rows works without any problems. Sidenote: I haven't tested the `executemany` but I did test the `executescript` and passed the str of all the delete statements seperated by ; But this did not have any effect on the speed If it were a case of delete all rows then query defined delete would work, but unfortunately in this case it does not suit my needs. – Sander Jun 30 '21 at 19:29