1

I feel like this is a very basic question, but I cannot find a comprehensive answer on this. I am wondering what is best practice for copying python data types, like lists or dictionaries, to a postgres database (using psycopg2).

Assuming I create or have a table that I want to fill up with data from a previous calculation, like

data = [(n, random.randint(0, 100)) for n in range(0, 100)]

As far as I know, the standard way to copy the data would be something like

curs.executemany("""
  INSERT INTO my_table (id, rand_int)
  VALUES (%s, %s)""", data)

I am guessing this will loop through the list and be rather slow. Is there a smarter or more efficient way?

UPDATE:

In the meantime I found this answer, suggesting to use something like:

args_str = ','.join(cur.mogrify("(%s,%s)", x) for x in data)
cur.execute("INSERT INTO table VALUES " + args_str) 

And Craig suggests in his answer here to use copy_from. So my question has changed a little:

What is the most efficient method and how could it be implemented?

Community
  • 1
  • 1
n1000
  • 5,058
  • 10
  • 37
  • 65
  • 2
    I'm voting to close this question as off-topic because this is a Python question and, as such, belongs on StackOverflow - recommend migration. – Vérace Jun 21 '15 at 21:29
  • @Vérace while I agree with being OT here, question about improvement of a working solution are not overly welcome on SO. – bummi Jun 21 '15 at 21:56
  • I've only posted a few times on SO (mostly to give a bounty for a problem I was having). I'm surprised though - surely optimisation is a significant part of programming? AFAIC, questions about optimising poorly performing SQL are on-topic here. – Vérace Jun 21 '15 at 21:59
  • oh. I was under the impression that all database related questions should go here (after seeing people complaining on SO). Will move the question to SO once I am back on my computer. thanks anyway. – n1000 Jun 21 '15 at 22:53

1 Answers1

2

For big data sets, use COPY, via psycopg2's copy_from function.

http://initd.org/psycopg/docs/cursor.html#cursor.copy_from

See also how to speed up insertion performance in PostgreSQL.

Community
  • 1
  • 1
Craig Ringer
  • 307,061
  • 76
  • 688
  • 778
  • Thank you! `copy_from` is actually the function that got me thinking in the first place. However, I am not sure how to make it work with data types. Would I need to pickle it first or something? May I ask you to give an example how this could look like? For instance based on my dummy case above. – n1000 Jun 22 '15 at 07:12
  • 1
    You need to form a CSV row per tuple, where each field is a PostgreSQL-input-compatible representation of a value of that column's type. The Python `csv` module is very useful for creating the stream of rows. Most values are trivial to store - integers, floating point values, text, etc you just store as-is. Dates you just format as YYYY-MM-DD. Only a few things, like timestamps, take any real thought about how to represent them in the CSV input. – Craig Ringer Jun 22 '15 at 07:16
  • so something like `data_csv = [str(i) for i in data]` and then `cur.copy_from(data_csv, 'my_table', columns=('id', 'rand_int'))`? Would you mind if I edited your answer? – n1000 Jun 22 '15 at 07:25
  • No, because the `data_csv` is a list. You need a *file-like object* where each *data row* is a comma-separated *string*, terminated with a newline. Use the Python `csv` module. I don't have time to write a canned example right now, but I have the feeling I might've written one in the past; search around for examples using psycopg2's `copy_from` – Craig Ringer Jun 22 '15 at 07:26