1

I am facing performance issues when writing on a SQL DB with SQLAlchemy. We have thousands of records to write having a lot of relationship each. By investigating, we realized that each record is added with a single "insert into". Here is a small data model as example:

Model declaration

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Table, Column, Integer, String, MetaData, ForeignKey
from sqlalchemy.orm import relationship,backref,sessionmaker
from sqlalchemy import create_engine

engine= create_engine('sqlite:///model.db',echo=True)

Session=sessionmaker(bind=engine)
session=Session()

Base=declarative_base()
metagate=Base.metadata

class Parent(Base):
    __tablename__='PARENT'

    PK_ID=Column(Integer,primary_key=True)
    attribute=Column(String(10))

    relation1=relationship('Child', cascade="all, delete-orphan" ,single_parent=True, back_populates="relation2")

    def __init__(self, attribute=None):
        self.attribute=attribute

class Child(Base):
    __tablename__='CHILD'

    PK_ID=Column(Integer,primary_key=True)
    attribute=Column(String(10))    

    relation2_ID=Column(Integer, ForeignKey('PARENT.PK_ID'))
    relation2 = relationship("Parent", cascade="all, delete-orphan",single_parent=True, back_populates="relation1") 

    def __init__(self, attribute=None):
        self.attribute=attribute

Base.metadata.create_all(engine)

Running snippet

obj1=Parent('foo')
    
for attribute in range(10):
    obj2=Child(str(attribute))
    obj1.relation1.append(obj2)
        
session.add(obj1)
session.commit()

Generated SQL

SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
()
SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
()
PRAGMA table_info("PARENT")
()
PRAGMA table_info("CHILD")
()
SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
()
SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
()
BEGIN (implicit)
INSERT INTO "PARENT" (attribute) VALUES (?)
('foo',)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('0', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('1', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('2', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('3', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('4', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('5', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('6', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('7', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('8', 3)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES (?, ?)
('9', 3)
COMMIT

We were expecting something like:

SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
()
SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
()
PRAGMA table_info("PARENT")
()
PRAGMA table_info("CHILD")
()
SELECT CAST('test plain returns' AS VARCHAR(60)) AS anon_1
()
SELECT CAST('test unicode returns' AS VARCHAR(60)) AS anon_1
()
BEGIN (implicit)
INSERT INTO "PARENT" (attribute) VALUES (?)
('foo',)
INSERT INTO "CHILD" (attribute, "relation2_ID") VALUES 
('0', 3),
('1', 3),
...
('9', 3)
COMMIT

We initially used Session.add for each child, then Session.bulk_save_objects() and now the "cascade save-update" without seeing any performance benefits. I would like to know if there is a way to insert all related relationship in a single query ?

If it can help, the final database will be a SQL Server 2012. Our first try tooks more than 1h to save 1 record:

  • size of data: around 800kB
  • number of table involved: 15
  • number of related record: around 20000

Thanks in advance,

BLH

Ilja Everilä
  • 50,538
  • 7
  • 126
  • 127
BLH
  • 11
  • 3

1 Answers1

0

According to the SQLAlchemy documentation, the pyodbc driver supports a fast executemany mode for MSSQL. So you may find you get better performance testing on an MSSQL server with this mode enabled.

engine = create_engine(
    "mssql+pyodbc://scott:tiger@mssql2017:1433/test?driver=ODBC+Driver+13+for+SQL+Server",
    fast_executemany=True)

The docs warn that this feature does not work well with "very large batches" of data because the data is stored in memory, but 800KB should ok.

snakecharmerb
  • 47,570
  • 11
  • 100
  • 153
  • Hi snakecharmerb, it did not worked as is, but it drives me to [this post](https://stackoverflow.com/questions/48006551/speeding-up-pandas-dataframe-to-sql-with-fast-executemany-of-pyodbc) to do this fast_executemany method. It has improve computing time by around 20 %. It still takes around 4400 s. Print the echo is still showing a lot of insert into, so it did not improve the SQL request itself. We will probably store theses data in another way.Thank you for the tip. – BLH Aug 31 '20 at 08:06