0

I have a table called

products

Which holds columns

id, data

data here is a JSONB. id is a unique ID.

I tried bulk adding 10k products it took me nearly 4 minutes. With lower products update works just fine, but for huge # of products it takes a lot of time, how can I optimize this? I am trying to bulk update 200k+ products, it's taking me more than 5 minutes right now.

updated_product_ids = []
for product in products:
    new_product = model.Product(id, data=product['data'])
    new_product['data'] = 'updated data'
    new_product['id'] = product.get('id')
    updated_product_ids.append(new_product)

def bulk_update(product_ids_arr):
   def update_query(count):
            return f"""
            UPDATE pricing.products
            SET data = :{count}
            WHERE id = :{count + 1}
            """

        queries = []
        params = {}
        count = 1
        for sku in product_ids_arr:
            queries.append(update_query(count))
            params[str(count)] = json.dumps(sku.data)
            params[str(count + 1)] = sku.id
            count += 2

        session.execute(';'.join(queries), params) #This is what takes so long..


bulk_update(updated_product_ids)

I thought using raw sql to execute this would be faster, but it's taking ALOT of time..

I am trying to update about only 8k products and it takes nearly 3 minutes or more..

  • Things that stick out: `bulk_update` is executed before it is defined, `update_query` is never executed, you don't need to care about the mode of `executemany` if you don't use it. – Klaus D. Apr 09 '21 at 06:51
  • @KlausD. Thanks for pointing those out. I updated the thread. – newbieprogrammer095 Apr 09 '21 at 07:16
  • Creating a string for each query is a fair bit of work, and might be avoidable by using SQLAlchemy's core or bulk insert tools. However I would also suggest batching your updates and commit each batch. If you commit after say each 1000 updates the database can allocate fewer resources, and that should improve performance. You can experiment to find the optimal number of records per batch. – snakecharmerb Apr 09 '21 at 07:45
  • related: https://stackoverflow.com/a/62379384/2144390 – Gord Thompson Apr 09 '21 at 11:18
  • What is `skus`? It isn't defined anywhere. – jjanes Apr 09 '21 at 15:04
  • Do you have `autocommit` enabled? You can often improve performance on bulk updates by using `BEGIN TRANSACTION` and `COMMIT` manually. The server will batch up all the changes and apply them in one shot at the end. – Tim Roberts Apr 09 '21 at 18:35

0 Answers0