Optimize session.commit with Sqlalchemy

Question

I am facing performance issues when writing on a SQL DB with SQLAlchemy. We have thousands of records to write having a lot of relationship each. By investigating, we realized that each record is added with a single "insert into". Here is a small data model as example:

Model declaration

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Table, Column, Integer, String, MetaData, ForeignKey
from sqlalchemy.orm import relationship,backref,sessionmaker
from sqlalchemy import create_engine

engine= create_engine('sqlite:///model.db',echo=True)

Session=sessionmaker(bind=engine)
session=Session()

Base=declarative_base()
metagate=Base.metadata

class Parent(Base):
    __tablename__='PARENT'

    PK_ID=Column(Integer,primary_key=True)
    attribute=Column(String(10))

    relation1=relationship('Child', cascade="all, delete-orphan" ,single_parent=True, back_populates="relation2")

    def __init__(self, attribute=None):
        self.attribute=attribute

class Child(Base):
    __tablename__='CHILD'

    PK_ID=Column(Integer,primary_key=True)
    attribute=Column(String(10))    

    relation2_ID=Column(Integer, ForeignKey('PARENT.PK_ID'))
    relation2 = relationship("Parent", cascade="all, delete-orphan",single_parent=True, back_populates="relation1") 

    def __init__(self, attribute=None):
        self.attribute=attribute

Base.metadata.create_all(engine)

Running snippet

obj1=Parent('foo')
    
for attribute in range(10):
    obj2=Child(str(attribute))
    obj1.relation1.append(obj2)
        
session.add(obj1)
session.commit()

Generated SQL

SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
()
SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
()
PRAGMA table_info("PARENT")
()
PRAGMA table_info("CHILD")
()
SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
()
SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
()
BEGIN (implicit)
INSERT INTO "PARENT" (attribute) VALUES (?)
('foo',)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('0', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('1', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('2', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('3', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('4', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('5', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('6', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('7', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('8', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('9', 3)
COMMIT

We were expecting something like:

SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
()
SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
()
PRAGMA table_info("PARENT")
()
PRAGMA table_info("CHILD")
()
SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
()
SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
()
BEGIN (implicit)
INSERT INTO "PARENT" (attribute) VALUES (?)
('foo',)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES 
('0', 3),
('1', 3),
...
('9', 3)
COMMIT

We initially used Session.add for each child, then Session.bulk_save_objects() and now the "cascade save-update" without seeing any performance benefits. I would like to know if there is a way to insert all related relationship in a single query ?

If it can help, the final database will be a SQL Server 2012. Our first try tooks more than 1h to save 1 record:

size of data: around 800kB
number of table involved: 15
number of related record: around 20000

Thanks in advance,

BLH

score 0 · Answer 1 · answered Aug 29 '20 at 07:15

0

According to the SQLAlchemy documentation, the pyodbc driver supports a fast executemany mode for MSSQL. So you may find you get better performance testing on an MSSQL server with this mode enabled.

engine = create_engine(
    "mssql+pyodbc://scott:tiger@mssql2017:1433/test?driver=ODBC+Driver+13+for+SQL+Server",
    fast_executemany=True)

The docs warn that this feature does not work well with "very large batches" of data because the data is stored in memory, but 800KB should ok.

answered Aug 29 '20 at 07:15

snakecharmerb

47,570
11
100
153

Hi snakecharmerb, it did not worked as is, but it drives me to [this post](https://stackoverflow.com/questions/48006551/speeding-up-pandas-dataframe-to-sql-with-fast-executemany-of-pyodbc) to do this fast_executemany method. It has improve computing time by around 20 %. It still takes around 4400 s. Print the echo is still showing a lot of insert into, so it did not improve the SQL request itself. We will probably store theses data in another way.Thank you for the tip. – BLH Aug 31 '20 at 08:06

Optimize session.commit with Sqlalchemy

1 Answers1