4

In continue to my previous post I'm trying to use the bulk_save_objects for a list of objects (the objects dont have a PK value therefore it should create it for each object). When I use the bulk_save_objects I see an insert per object instead of one insert for all objects.

The code :

class Product(Base):

    __tablename__ = 'products'
    id = Column('id',BIGINT, primary_key=True)
    barcode = Column('barcode' ,BIGINT)
    productName = Column('name', TEXT,nullable=False)
    objectHash=Column('objectHash',TEXT,unique=True,nullable=False)

    def __init__(self, productData,picture=None):
        self.barcode = productData[ProductTagsEnum.barcode.value]
        self.productName = productData[ProductTagsEnum.productName.value]
        self.objectHash = md5((str(self.barcode)+self.produtName).encode('utf-8')).hexdigest()

Another class contains the following save method :

def saveNewProducts(self,products):
    Session = sessionmaker()
    session=Session()
    productsHashes=[ product.objectHash for product in products]
    query = session.query(Product.objectHash).filter(Product.objectHash.in_(productsHashes))
    existedHashes=query.all()
    newProducts = [ product for product in products if product.objectHash not in productsHashes]
    /*also tried : session.bulk_save_objects(newProducts, preserve_order=False)*/

    session.bulk_save_objects(newProducts)

UPDATE 1

I following what @Ilja Everilä recommended in the comments, I added a few parameters to the connection string :

 engine = create_engine('postgresql://postgres:123@localhost:5432/mydb', pool_size=25, max_overflow=0,
                           executemany_mode='values',
                           executemany_values_page_size=10000, executemany_batch_page_size=500,
                           echo=True)

In the console I saw multiple inserts with the following format :

2019-09-16 16:48:46,509 INFO sqlalchemy.engine.base.Engine INSERT INTO products (barcode, productName, objectHash) VALUES (%(barcode)s, %(productName)s, %(objectHash)s, ) RETURNING products.id
2019-09-16 16:48:46,509 INFO sqlalchemy.engine.base.Engine {'barcode': '5008251', 'productName': 'ice ream','object_hash': 'b2752233ec523f2e874dc95b70020ae5'}
JeyJ
  • 3,582
  • 4
  • 35
  • 83
  • 2
    What is the database backend? postgresql, mysql etc? – Anthony Kong Sep 16 '19 at 08:05
  • postgresql version 11 – JeyJ Sep 16 '19 at 11:28
  • psycopg2's `executemany()`, which I think `bulk_save_objects()` ends up using, is essentially a loop of inserts over the records. It is noted in its documentation that it is slow. You could try enabling the fast execution helpers to see if it helps: https://docs.sqlalchemy.org/en/13/dialects/postgresql.html#psycopg2-executemany-mode, but it might not depending on SQLA implementation (haven't checked). `executemany_mode='values'` seems promising. – Ilja Everilä Sep 16 '19 at 12:06
  • 1
    I tried it but it still doesnt look like it combines multiple inserts into one. I'm additing to the main post the log – JeyJ Sep 16 '19 at 13:52
  • I haven't used the bulk features that much, but `RETURNING products.id` looks like it's populating the primary key for inserted objects, which means it is forced to do inserts one by one. I think there was an argument for disabling that, but I thought it'd be on by default. – Ilja Everilä Sep 16 '19 at 14:11
  • Thats sounds like the return_defaults parameter of the bulk_save_objects function, but it is False by default . Did u mean something else ? – JeyJ Sep 16 '19 at 14:22
  • That's the one. It is strange that it'd be using RETURNING, if it's `False`. – Ilja Everilä Sep 16 '19 at 14:25
  • I also checked in the database`s logs, it defenitly looks like it runs the inserts ony by one. Tried also to set it to on, but nothing changed. – JeyJ Sep 16 '19 at 14:31
  • @IljaEverilä any idea what else can I check ? – JeyJ Sep 16 '19 at 18:44

1 Answers1

0

In my case, the solution I used : I deleted the id column and set the objectHash as PK, and afterwards the save_bulk and add_all functions worked and actually did bulk insert. It seems like those functions work only if you already have the pk inside the object.

halfer
  • 19,824
  • 17
  • 99
  • 186
JeyJ
  • 3,582
  • 4
  • 35
  • 83