How do I improve the performance of python insert into Postgres

Question

Using execute 40 inserts per minute
Using executemany 41 inserts per minute
Using extras.execute_Values 42 inserts per minute

def save_return_to_postgres(record_to_insert) -> Any:


    insert_query = """INSERT INTO pricing.xxxx (description,code,unit,price,created_date,updated_date)
             VALUES %sreturning id"""

    records = (record_to_insert[2],record_to_insert[1],record_to_insert[3],record_to_insert[4],record_to_insert[0],datetime.datetime.now())

    # df = df[["description","code","unit","price","created_date","updated_date"]]

    
    try:
        
        conn = psycopg2.connect(database = 'xxxx',
                        user = 'xxxx',
                        password = 'xxxxx',
                        host= 'xxxx',
                        port='xxxx',
                        connect_timeout = 10)

        print("Connection Opened with Postgres")

        cursor = conn.cursor()
        extras.execute_values(cursor, insert_query, [records])
        
        conn.commit()

        # print(record_to_insert)

    finally:
        
        if conn:
            cursor.close()
            conn.close()
            print("Connection to postgres was successfully closed")

valores = df.values

for valor in valores:
    save_return_to_postgres(valor)
    print(valor)

Why are you opening and closing the connection every time in a loop? No wonder it's slow. — Jared Smith, Aug 29 '22 at 15:06
Besides opening and closing the database connection each query, you could also wrap multiple INSERT's into a single COMMIT. Using a prepared statement could also improve performance. But first you have to get rid of how you handle the database connection. — Frank Heikens, Aug 29 '22 at 15:11
Is that really `per minute`? Even with the inefficient way you are doing this I would expect better performance. To your question add 1) Postgres version. 2) psycopg2 version. 3) Information on where your client is relative to the server e.g. same machine, same local network, separated by remote network, etc. — Adrian Klaver, Aug 29 '22 at 16:32

score 1 · Answer 1 · answered Aug 29 '22 at 15:01

I don't know how much lines-per-insert postgres can take

But many SQL-based databases can take multiples inserts at the same time.

So instead of running

for insert_query in queries:
   sql_execute(insert_query)

Try making several inserts at once in a single command (Test it on pure SQL first to see if it works)

insert_list=[]
for insert_query in queries:
    insert_list.append(insert_query)
sql_execute(insert_list)

I had a similar issue and this link helped me https://www.sqlservertutorial.net/sql-server-basics/sql-server-insert-multiple-rows/ (of course mine was not Postgres but the idea is the same, decrease internet time by running multiple inserts in one command)

Tamo Junto

OP is using postgres not sql-server, but of course postgres [can do the same thing](https://www.psycopg.org/docs/extras.html). — Jared Smith, Aug 29 '22 at 15:13

Adrian Klaver · Answer 2 · 2022-08-29T16:27:55.333

Use execute_batch or execute_values and use them over the entire record set. As of now you are not using the batch capabilities of execute_values because you are inserting a single record at a time. You are further slowing things down by opening and closing a connection for each record as that is a time/resource expensive operation. Below is untested as I don't have the actual data and am assuming what df.values is.

insert_query = """INSERT INTO pricing.xxxx (description,code,unit,price,created_date,updated_date)
             VALUES %s returning id"""
#execute_batch query
#insert_query = """INSERT INTO pricing.xxxx #(description,code,unit,price,created_date,updated_date)
#             VALUES (%s, %s, %s, %s, %s, %s) returning id"""

valores = df.values
#Create a list of lists to pass to query as a batch instead of singly.
records = [[record_to_insert[2],record_to_insert[1],record_to_insert[3],
            record_to_insert[4],record_to_insert[0],datetime.datetime.now()] 
           for record_to_insert in valores]

try:
        
        conn = psycopg2.connect(database = 'xxxx',
                        user = 'xxxx',
                        password = 'xxxxx',
                        host= 'xxxx',
                        port='xxxx',
                        connect_timeout = 10)

        print("Connection Opened with Postgres")

        cursor = conn.cursor()
        extras.execute_values(cursor, insert_query, [records])
        #execute_batch
        #extras.execute_batch(cursor, insert_query, [records])

        conn.commit()

        # print(record_to_insert)

    finally:
        
        if conn:
            cursor.close()
            conn.close()
            print("Connection to postgres was successfully closed")

For more information see Fast execution helpers. Note that both the execute_values and execute_batch functions have a page_size argument of default value 100. This is the batch size for the operations. For large data sets you can reduce the time further by increasing the page_size to make bigger batches and reduce the number of server round trips .

How do I improve the performance of python insert into Postgres

2 Answers2