3

While using the bulk insertion code as given in the performance link in SQLAlchemy http://docs.sqlalchemy.org/en/latest/faq/performance.html , the sqlite works fine and takes time as described in their document. While using the same code for the postgresql connection string . The total time is multiplied by many times.

Is there any way to make it faster in postgresql? What i am doing wrong here ??

Especially bulk_insert_mappings and bulk_save_objects, which are my only options to insert 370,000 rows.

Postgresql connection string

connection_string = 'postgresql://' + conf.DB_USER + ':' + conf.DB_PASSWORD + '@' + \
                    conf.DB_HOST + ':' + conf.DB_PORT + '/' + conf.DB_NAME

Code used for checking performance :

import time
import sqlite3

from sqlalchemy.ext.declarative import declarative_base
from sqlalchemy import Column, Integer, String,  create_engine
from sqlalchemy.orm import scoped_session, sessionmaker


Base = declarative_base()
DBSession = scoped_session(sessionmaker())
engine = None


class Customer(Base):
    __tablename__ = "customer"
    id = Column(Integer, primary_key=True)
    name = Column(String(255))


def init_sqlalchemy(dbname='sqlite:///sqlalchemy.db'):
    global engine
    connection_string = 'postgresql://' + 'scott' + ':' + 'tiger' + '@' + \
                        'localhost' + ':' + '5432' + '/' + 'test_db'
    engine = create_engine(connection_string, echo=False)
    DBSession.remove()
    DBSession.configure(bind=engine, autoflush=False, expire_on_commit=False)
    Base.metadata.drop_all(engine)
    Base.metadata.create_all(engine)


def test_sqlalchemy_orm(n=100000):
    init_sqlalchemy()
    t0 = time.time()
    for i in xrange(n):
        customer = Customer()
        customer.name = 'NAME ' + str(i)
        DBSession.add(customer)
        if i % 1000 == 0:
            DBSession.flush()
    DBSession.commit()
    print(
        "SQLAlchemy ORM: Total time for " + str(n) +
        " records " + str(time.time() - t0) + " secs")


def test_sqlalchemy_orm_pk_given(n=100000):
    init_sqlalchemy()
    t0 = time.time()
    for i in xrange(n):
        customer = Customer(id=i+1, name="NAME " + str(i))
        DBSession.add(customer)
        if i % 1000 == 0:
            DBSession.flush()
    DBSession.commit()
    print(
        "SQLAlchemy ORM pk given: Total time for " + str(n) +
        " records " + str(time.time() - t0) + " secs")


def test_sqlalchemy_orm_bulk_save_objects(n=100000):
    init_sqlalchemy()
    t0 = time.time()
    n1 = n
    while n1 > 0:
        n1 = n1 - 10000
        DBSession.bulk_save_objects(
            [
                Customer(name="NAME " + str(i))
                for i in xrange(min(10000, n1))
            ]
        )
    DBSession.commit()
    print(
        "SQLAlchemy ORM bulk_save_objects(): Total time for " + str(n) +
        " records " + str(time.time() - t0) + " secs")

def test_sqlalchemy_orm_bulk_insert(n=100000):
    init_sqlalchemy()
    t0 = time.time()
    n1 = n
    while n1 > 0:
        n1 = n1 - 10000
        DBSession.bulk_insert_mappings(
            Customer,
            [
                dict(name="NAME " + str(i))
                for i in xrange(min(10000, n1))
            ]
        )
    DBSession.commit()
    print(
        "SQLAlchemy ORM bulk_insert_mappings(): Total time for " + str(n) +
        " records " + str(time.time() - t0) + " secs")

def test_sqlalchemy_core(n=100000):
    init_sqlalchemy()
    t0 = time.time()
    engine.execute(
        Customer.__table__.insert(),
        [{"name": 'NAME ' + str(i)} for i in xrange(n)]
    )
    print(
        "SQLAlchemy Core: Total time for " + str(n) +
        " records " + str(time.time() - t0) + " secs")


def init_sqlite3(dbname):
    conn = sqlite3.connect(dbname)
    c = conn.cursor()
    c.execute("DROP TABLE IF EXISTS customer")
    c.execute(
        "CREATE TABLE customer (id INTEGER NOT NULL, "
        "name VARCHAR(255), PRIMARY KEY(id))")
    conn.commit()
    return conn


def test_sqlite3(n=100000, dbname='sqlite3.db'):
    conn = init_sqlite3(dbname)
    c = conn.cursor()
    t0 = time.time()
    for i in xrange(n):
        row = ('NAME ' + str(i),)
        c.execute("INSERT INTO customer (name) VALUES (?)", row)
    conn.commit()
    print(
        "sqlite3: Total time for " + str(n) +
        " records " + str(time.time() - t0) + " sec")

if __name__ == '__main__':
    test_sqlalchemy_orm(100000)
    test_sqlalchemy_orm_pk_given(100000)
    test_sqlalchemy_orm_bulk_save_objects(100000)
    test_sqlalchemy_orm_bulk_insert(100000)
    test_sqlalchemy_core(100000)
    test_sqlite3(100000)

Output :

SQLAlchemy ORM: Total time for 100000 records 40.6781959534 secs
SQLAlchemy ORM pk given: Total time for 100000 records 21.0855250359 secs
SQLAlchemy ORM bulk_save_objects(): Total time for 100000 records 14.068707943 secs
SQLAlchemy ORM bulk_insert_mappings(): Total time for 100000 records 11.6551070213 secs
SQLAlchemy Core: Total time for 100000 records 12.5298728943 secs
sqlite3: Total time for 100000 records 0.477468013763 sec

Using the original connection string (i.e. sqlite):

engine = create_engine(dbname, echo=False)

Output :

SQLAlchemy ORM: Total time for 100000 records 16.9145789146 secs
SQLAlchemy ORM pk given: Total time for 100000 records 10.2713520527 secs
SQLAlchemy ORM bulk_save_objects(): Total time for 100000 records 3.69206118584 secs
SQLAlchemy ORM bulk_insert_mappings(): Total time for 100000 records 1.00701212883 secs
SQLAlchemy Core: Total time for 100000 records 0.467703104019 secs
sqlite3: Total time for 100000 records 0.566409826279 sec
  • 2
    Please don't include code (or output) as images, since they can't be copied and don't support searching. Code is text. – Ilja Everilä May 26 '17 at 19:14
  • 1
    You also should include a minimal but verifiable example of what you've tried. Just saying "I followed " and mentioning a few functions does not usually cut it. Now, `bulk_insert_mappings` and `bulk_save_objects` are no silver bullets, and the actual performance may depend on many factors. For example the mentioned bulk operations collect simple inserts to a single `executemany`, but since you're testing Postgresql, you're probably using psycopg2 as DB-API driver. [`executemany()`](http://initd.org/psycopg/docs/cursor.html#cursor.executemany) is no faster than `execute()`... – Ilja Everilä May 26 '17 at 19:26
  • 1
    ...in a loop in its current implementation, or so says the docs. You can on the other hand use other features of psycopg2 to speed up large bulk inserts: https://stackoverflow.com/questions/43317376/how-can-i-use-psycopg2-extras-in-sqlalchemy/43321119#43321119 – Ilja Everilä May 26 '17 at 19:28
  • @IljaEverilä : images are replaced with the code . The solution to speed up psycopg2 worked . Thanks :) cheers . It is taking about 1 sec now – Ankur Panwar May 28 '17 at 19:57

1 Answers1

2

The fastest method is to use COPY FROM (see SQLAlchemy, Psycopg2 and Postgresql COPY) but if you do NOT have write permissions, e.g. deploying to Heroku, then you can leverage Psycopg2 Fast Execution Helpers.

For example, for bulk or core insert, the following:

engine = create_engine(
    "postgresql+psycopg2://scott:tiger@host/dbname",
    executemany_mode='values',
    executemany_values_page_size=10000)

brings the timings to:

SQLAlchemy ORM bulk_save_objects(): Total time for 100000 records 2.796818971633911 secs
SQLAlchemy ORM bulk_insert_mappings(): Total time for 100000 records 1.3805248737335205 secs
SQLAlchemy Core: Total time for 100000 records 1.1153180599212646 secs

Instead of

SQLAlchemy ORM bulk_save_objects(): Total time for 100000 records 9.02771282196045 secs
SQLAlchemy ORM bulk_insert_mappings(): Total time for 100000 records 7.643821716308594 secs
SQLAlchemy Core: Total time for 100000 records 7.460561275482178 secs
Oleg
  • 10,406
  • 3
  • 29
  • 57