I'm facing an issue with deleting thousands of rows from a single table takes a long time (over 2 minutes for 14k records) while inserting the same records is nearly instant (200ms). Both the insert and delete statements are handled identically - a loop generates the statements and appends them to a list, then the list is passed to a separate function that opens a transaction, executes all the statements and then finishes with commit. At least this was my impression before I started testing this with pseudocode - but it looks like I misunderstood the need for opening the transaction manually.
I've read about transactions (https://www.sqlite.org/faq.html#q19) but since the inserts are pretty much instant then I am not sure whether this is the case here.
My understanding is that transaction == commit, and if this is correct then it looks like all the delete statements are in a single transaction - mid-processing I can see all the deleted rows until the final commit, after which they are actually deleted. Ie the situation in the FAQ link below should be different - since no commit takes place. But the slow speed indicates that it is still doing something else that is slowing things down as if each delete statement were a separate transaction.
After running the pseudocode it appears that while the changes are not committed until explicit commit is sent (via conn.commit()) but the "begin" or "begin transaction" in front of the loop does not have any effect. I think this is because sqlite3 sends the "begin" automatically in the background ( Merge SQLite files into one db file, and 'begin/commit' question )
Pseudocode to test this out:
import sqlite3
from datetime import datetime
insert_queries = []
delete_queries = []
rows = 30000
for i in range(rows):
insert_queries.append(f'''INSERT INTO test_table ("column1") VALUES ("{i}");''')
for i in range(rows):
delete_queries.append(f'''DELETE from test_table where column1 ="{i}";''')
conn = sqlite3.connect('/data/test.db', check_same_thread=False)
timestamp = datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%S")
print('*'*50)
print(f'Starting inserts: {timestamp}')
# conn.execute('BEGIN TRANSACTION')
for query in insert_queries:
conn.execute(query)
conn.commit()
timestamp = datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%S")
print(f'Finished inserts: {timestamp}')
print('*'*50)
timestamp = datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%S")
print(f'Starting deletes: {timestamp}')
# conn.isolation_level = None
# conn.execute('BEGIN;')
# conn.execute('BEGIN TRANSACTION;')
for query in delete_queries:
conn.execute(query)
conn.commit()
timestamp = datetime.utcnow().strftime("%Y-%m-%dT%H:%M:%S")
print(f'Finished deletes: {timestamp}')
One weird thing is that the row count exponentially increases the delete time (2s to delete 10k rows, 7s to delete 20k rows, 43s to delete 50k rows) while the insert time is instant regardless of row count.
EDIT: The original question was - why does the delete statement take so much more time compared to insert statement and how to speed it up so the speeds for both inserting and deleting rows would be similar.
As per snakecharmerb's suggestion one workaround around this would be to do it like this:
rows = 100000
delete_ids = ''
for i in range(rows):
if delete_ids:
delete_ids += f',"{i}"'
else:
delete_ids += f'"{i}"'
delete_str = f'''DELETE from test_table where column1 IN ({delete_ids});'''
conn.execute(delete_str)
conn.commit()
While this is most likely against all best-practices, it does seem to work - it takes around 2 seconds to delete 1mil rows.