4

I wonder of the fastest way to write data from pandas DataFrame to table in postges DB.

1) I've tried pandas.to_sql, but for some reason it takes entity to copy data,

2) besides I've tried following:

import io
f = io.StringIO()
pd.DataFrame({'a':[1,2], 'b':[3,4]}).to_csv(f)
cursor = conn.cursor()
cursor.execute('create table bbbb (a int, b int);COMMIT; ')
cursor.copy_from(f, 'bbbb', columns=('a', 'b'), sep=',')
cursor.execute("select * from bbbb;")
a = cursor.fetchall()
print(a)
cursor.close()

but it returns empty list [].

So I have two questions: what is the fastest way to copy data from python code (dataframe) to postgres DB? and what was incorrect in the second approach that I've tried?

Marcel Mars
  • 388
  • 5
  • 16

1 Answers1

9

Your second approach should be very fast.

There are two problems with your code:

  1. After writing the csv to f you are positioned at the end of the file. You need to put your position back to the beginning before starting to read.
  2. When writing a csv, you need to omit the header and index

Here is what your final code should look like:

import io
f = io.StringIO()
pd.DataFrame({'a':[1,2], 'b':[3,4]}).to_csv(f, index=False, header=False)  # removed header
f.seek(0)  # move position to beginning of file before reading
cursor = conn.cursor()
cursor.execute('create table bbbb (a int, b int);COMMIT; ')
cursor.copy_from(f, 'bbbb', columns=('a', 'b'), sep=',')
cursor.execute("select * from bbbb;")
a = cursor.fetchall()
print(a)
cursor.close()
Michael
  • 969
  • 6
  • 19
  • I recieve following error code: InternalError: current transaction is aborted, commands ignored until end of transaction block - looks like some sql error orr??) – Marcel Mars Jan 26 '17 at 15:37
  • You are in the middle of a [transaction](https://www.postgresql.org/docs/current/static/tutorial-transactions.html) that had an error in it. You need to do `conn.rollback()` and then `conn.begin()` again. – Michael Jan 26 '17 at 15:41
  • Thanks a lot, but still. DataError: extra data after last expected column CONTEXT: COPY bbbb, line 1: "0,1,3" – Marcel Mars Jan 26 '17 at 16:50
  • Ok it looks like it is trying to insert three values but your table only has two columns. I think the extra value is from the dataframe index. So when you write the csv, you should also omit the index. I edited the code above to do that. – Michael Jan 26 '17 at 17:38
  • Finally, I got it! Thank you for your help! Could also give a hint why pandas.to_sql implementation works so slowly? – Marcel Mars Jan 26 '17 at 18:29
  • Hi, I asked a similar Question having done much research and similarly concluded that a copy via filesystem (csv dump) was the current fastest method. There is allot more information, different methods and possible extra optimisations in the answer to that [question](https://stackoverflow.com/questions/55470614/dataframe-writing-to-postgresql-poor-performance?noredirect=1#comment108183083_55470614) in case anyone needs it. – Giles Apr 12 '20 at 11:46