Some time ago I had been struggling with the problem while working in the company, so we had created a library with functions for bulk insert and update. Hope we've taken into account all performance and security concerns. This library is open-sourced and available on PyPI, its name: bulky
.
Let me show you some examples of usage:
insert:
import bulky
from your.sqlalchemy.models import Model
from your.sqlalchemy.session import Session
data = [
{Model.column_float: random()}
for _ in range(100_000_000)
]
rows_inserted = bulky.insert(
session=Session,
table_or_model=Model,
values_series=data,
returning=[Model.id, Model.column_float]
)
new_items = {row.id: row.column_float for row in rows_inserted}
update:
import bulky
from your.sqlalchemy.models import ManyToManyTable
from your.sqlalchemy.session import Session
data = [
{
ManyToManyTable.fk1: i,
ManyToManyTable.fk2: j,
ManyToManyTable.value: i + j,
}
for i in range(100_000_000)
for j in range(100_000_000)
]
rows_updated = bulky.update(
session=Session,
table_or_model=ManyToManyTable,
values_series=data,
returning=[
ManyToManyTable.fk1,
ManyToManyTable.fk2,
ManyToManyTable.value,],
reference=[
ManyToManyTable.fk1,
ManyToManyTable.fk2,],
)
updated_items = {(row.fk1, row.fk2): row.value for row in rows_updated}
Not sure if links are allowed, so I'll put them under spoiler
Readme and PyPI