I'm using SQLAlchemy 1.0.0
, and want to make some UPDATE ONLY
(update if match primary key else do nothing) queries in batch.
I've made some experiment and found that bulk update looks much slower than bulk insert or bulk upsert
.
Could you please help me to point out why it works so slow or is there any alternative way/idea to make the BULK UPDATE (not BULK UPSERT) with SQLAlchemy
?
Below is the table in MYSQL:
CREATE TABLE `test` (
`id` int(11) unsigned NOT NULL,
`value` int(11) DEFAULT NULL,
PRIMARY KEY (`id`)
) ENGINE=InnoDB DEFAULT CHARSET=utf8mb4 COLLATE=utf8mb4_unicode_ci;
And the test code:
from sqlalchemy import create_engine, text
import time
driver = 'mysql'
host = 'host'
user = 'user'
password = 'password'
database = 'database'
url = "{}://{}:{}@{}/{}?charset=utf8".format(driver, user, password, host, database)
engine = create_engine(url)
engine.connect()
engine.execute('TRUNCATE TABLE test')
num_of_rows = 1000
rows = []
for i in xrange(0, num_of_rows):
rows.append({'id': i, 'value': i})
print '--------- test insert --------------'
sql = '''
INSERT INTO test (id, value)
VALUES (:id, :value)
'''
start = time.time()
engine.execute(text(sql), rows)
end = time.time()
print 'Cost {} seconds'.format(end - start)
print '--------- test upsert --------------'
for r in rows:
r['value'] = r['id'] + 1
sql = '''
INSERT INTO test (id, value)
VALUES (:id, :value)
ON DUPLICATE KEY UPDATE value = VALUES(value)
'''
start = time.time()
engine.execute(text(sql), rows)
end = time.time()
print 'Cost {} seconds'.format(end - start)
print '--------- test update --------------'
for r in rows:
r['value'] = r['id'] * 10
sql = '''
UPDATE test
SET value = :value
WHERE id = :id
'''
start = time.time()
engine.execute(text(sql), rows)
end = time.time()
print 'Cost {} seconds'.format(end - start)
The output when num_of_rows = 100:
--------- test insert --------------
Cost 0.568960905075 seconds
--------- test upsert --------------
Cost 0.569655895233 seconds
--------- test update --------------
Cost 20.0891299248 seconds
The output when num_of_rows = 1000:
--------- test insert --------------
Cost 0.807548999786 seconds
--------- test upsert --------------
Cost 0.584554195404 seconds
--------- test update --------------
Cost 206.199367046 seconds
The network latency to database server is around 500ms.
Looks like in bulk update it send and execute each query one by one, not in batch?
Thanks in advance.