Use execute_batch
or execute_values
and use them over the entire record set. As of now you are not using the batch capabilities of execute_values
because you are inserting a single record at a time. You are further slowing things down by opening and closing a connection for each record as that is a time/resource expensive operation. Below is untested as I don't have the actual data and am assuming what df.values
is.
insert_query = """INSERT INTO pricing.xxxx (description,code,unit,price,created_date,updated_date)
VALUES %s returning id"""
#execute_batch query
#insert_query = """INSERT INTO pricing.xxxx #(description,code,unit,price,created_date,updated_date)
# VALUES (%s, %s, %s, %s, %s, %s) returning id"""
valores = df.values
#Create a list of lists to pass to query as a batch instead of singly.
records = [[record_to_insert[2],record_to_insert[1],record_to_insert[3],
record_to_insert[4],record_to_insert[0],datetime.datetime.now()]
for record_to_insert in valores]
try:
conn = psycopg2.connect(database = 'xxxx',
user = 'xxxx',
password = 'xxxxx',
host= 'xxxx',
port='xxxx',
connect_timeout = 10)
print("Connection Opened with Postgres")
cursor = conn.cursor()
extras.execute_values(cursor, insert_query, [records])
#execute_batch
#extras.execute_batch(cursor, insert_query, [records])
conn.commit()
# print(record_to_insert)
finally:
if conn:
cursor.close()
conn.close()
print("Connection to postgres was successfully closed")
For more information see Fast execution helpers. Note that both the execute_values
and execute_batch
functions have a page_size
argument of default value 100. This is the batch size for the operations. For large data sets you can reduce the time further by increasing the page_size
to make bigger batches and reduce the number of server round trips .