Creating a string using for loops in python with optimized time performance

Question

I have a pandas data frame called customer_df of length 11k. I am creating a SQL query to insert values of the df to a table customer_details on postgres.

my code right now is

insert_string = 'insert into  customer_details (col1,col2,col3,col4) values '
for i in range(len(customer_df)):
    insert_string = insert_string  + ('''('{0}','{1}','{2}','{3}'),'''.format(customer_df.iloc[i]['customer'],customer_df.iloc[i]['customerid'],customer_df.iloc[i]['loc'],customer_df.iloc[i]['pin']))

upload_to_postgres(insert_string)

this string finally gives me something like

insert into customer_details (col1,col2,col3,col4) values ('John',23,'KA',560021),('Ram',67,'AP',642918),(values of cust 3) .... (values of cust 11k) which is sent to postgres using the upload_to_postgres.

This process of creating the string takes around 30secs to 1min. Is there any better optimized way to reduce the timings?

Is there any restriction in using pandas to write it to postgres for you? https://stackoverflow.com/a/23104436/1212401 — Chris Doyle, Jan 13 '21 at 17:59
Then you dont need to construct all the SQL yourself, have a look in my link you can just call `df.to_sql` passing it the relevant params for postgres. — Chris Doyle, Jan 14 '21 at 10:12
tried doing df.to_sql the insert query got stuck idle in transaction state for a very long time — Rakesh SK, Jan 14 '21 at 11:12

score 0 · Answer 1 · answered Jan 13 '21 at 18:00

0

You can use pandas to create and read that df faster here's a pandas cheat sheet and the library download link https://pandas.pydata.org/Pandas_Cheat_Sheet.pdf https://pandas.pydata.org/pandas-docs/stable/getting_started/install.html

answered Jan 13 '21 at 18:00

Caio Navarro

48
1
13

OP is using pandas: "I have a pandas data frame". This is not an answer. – BigBen Jan 13 '21 at 18:25
1

I didn't see where he used that sorry. But if he's already using pandas just use this guide https://pandas.pydata.org/pandas-docs/stable/user_guide/enhancingperf.html – Caio Navarro Jan 13 '21 at 18:29
The answer space is for *answers* to the question, not for providing links. Please see [how to answer](https://stackoverflow.com/help/how-to-answer). Especially since you've had some lower-quality answers recently. – BigBen Jan 13 '21 at 18:29

Glauco · Answer 2 · 2021-01-17T22:30:28.773

0

The multiple insert is not the right way. At each row you create a transaction, so the best solution is to do a bulk insert. If you can create a file like a CSV and use the bulk action copy, your performance increases of order of magnitude. I've no example here, but i've found this article that exlain well the methodology.

edited Jan 17 '21 at 22:30

answered Jan 17 '21 at 18:21

Glauco

1,385
2
10
20

Creating a string using for loops in python with optimized time performance

2 Answers2