I am facing performance issues when writing on a SQL DB with SQLAlchemy. We have thousands of records to write having a lot of relationship each. By investigating, we realized that each record is added with a single "insert into". Here is a small data model as example:
Model declaration
from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Table, Column, Integer, String, MetaData, ForeignKey
from sqlalchemy.orm import relationship,backref,sessionmaker
from sqlalchemy import create_engine
engine= create_engine('sqlite:///model.db',echo=True)
Session=sessionmaker(bind=engine)
session=Session()
Base=declarative_base()
metagate=Base.metadata
class Parent(Base):
__tablename__='PARENT'
PK_ID=Column(Integer,primary_key=True)
attribute=Column(String(10))
relation1=relationship('Child', cascade="all, delete-orphan" ,single_parent=True, back_populates="relation2")
def __init__(self, attribute=None):
self.attribute=attribute
class Child(Base):
__tablename__='CHILD'
PK_ID=Column(Integer,primary_key=True)
attribute=Column(String(10))
relation2_ID=Column(Integer, ForeignKey('PARENT.PK_ID'))
relation2 = relationship("Parent", cascade="all, delete-orphan",single_parent=True, back_populates="relation1")
def __init__(self, attribute=None):
self.attribute=attribute
Base.metadata.create_all(engine)
Running snippet
obj1=Parent('foo')
for attribute in range(10):
obj2=Child(str(attribute))
obj1.relation1.append(obj2)
session.add(obj1)
session.commit()
Generated SQL
SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
()
SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
()
PRAGMA table_info("PARENT")
()
PRAGMA table_info("CHILD")
()
SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
()
SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
()
BEGIN (implicit)
INSERT INTO "PARENT" (attribute) VALUES (?)
('foo',)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('0', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('1', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('2', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('3', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('4', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('5', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('6', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('7', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('8', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('9', 3)
COMMIT
We were expecting something like:
SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
()
SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
()
PRAGMA table_info("PARENT")
()
PRAGMA table_info("CHILD")
()
SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
()
SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
()
BEGIN (implicit)
INSERT INTO "PARENT" (attribute) VALUES (?)
('foo',)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES
('0', 3),
('1', 3),
...
('9', 3)
COMMIT
We initially used Session.add
for each child, then Session.bulk_save_objects()
and now the "cascade save-update" without seeing any performance benefits. I would like to know if there is a way to insert all related relationship in a single query ?
If it can help, the final database will be a SQL Server 2012. Our first try tooks more than 1h to save 1 record:
- size of data: around 800kB
- number of table involved: 15
- number of related record: around 20000
Thanks in advance,
BLH