0

I'm testing the to_sql method of pandas DataFrame.

to_sql is using INSERT SQL method and this is slower than COPY FROM SQL method.

Can I select the COPY FROM SQL method to be used by to_sql, using SQLAlchemy and Postgresql (psycopg2) engine?

Ilja Everilä
  • 50,538
  • 7
  • 126
  • 127
xmn
  • 21
  • 3
  • Possible duplicate of [How to write DataFrame to postgres table?](https://stackoverflow.com/questions/23103962/how-to-write-dataframe-to-postgres-table) – Ilja Everilä Jul 15 '19 at 20:26

2 Answers2

2

This isn't quite true. You can implement this to use COPY FROM by providing your own function for the kwarg method that to_sql accepts. The key bit in the Pandas documentation for DataFrame.to_sql is where method can be not only just a string, but a callable. That's what you'll provide

For example:

from csv import (writer as csv_writer, QUOTE_MINIMAL)
from io import StringIO

def some_function(df, connection):        
    try:
        frame.to_sql(
            'my_table',
            connection,
            schema='public',
            if_exists='append',
            index=False,
            method=copy_from_method)
    except Exception as err:
        print('Got an error ({})'.format(str(err))

Then, you could implement copy_from_method as something like this:

def copy_from_method(table, conn, keys, data_iter, pre_truncate=False, fatal_failure=False):
    "Custom method for pandas.DataFrame.to_sql that will use COPY FROM"""

    dbapi_conn = conn.connection
    cur = dbapi_conn.cursor()

    s_buf = StringIO()
    writer = csv_writer(s_buf, quoting=QUOTE_MINIMAL)
    writer.writerows(data_iter)
    s_buf.seek(0)

    columns = ', '.join('"{}"'.format(k) for k in keys)
    table_name = '{}.{}'.format(
        table.schema, table.name) if table.schema else table.name

    sql_query = 'COPY %s (%s) FROM STDIN WITH CSV' % (table_name, columns)
    cur.copy_expert(sql=sql_query, file=s_buf)
    return cur.rowcount

The example copy_from_method could use some work obviously- and you'll need to be careful with quoting and escaping with the CSV writer in some cases. But this rough version should work relatively well, barring any typos/copy paste errors (this is a stripped down version I use in a project)

EDIT: For big datasets, don’t use format(), f-strings are significantly faster

adam
  • 384
  • 2
  • 9
0

You are correct to_sql is using INSERT INTO via sqlalchemy (code where this happens is here), and so naturally you cannot use COPY FROM using to_sql.

But if you need this, you can of course make an own implementation of to_sql using that. And if you think this has much better performance and the same functionality as the current implementation, you can always send a contribution to pandas (https://github.com/pydata/pandas/blob/master/CONTRIBUTING.md).

But, as far as I understand, if you want to execute an SQL statement as "COPY table FROM file", you can always just use the execute command of the postgres engine.

joris
  • 133,120
  • 36
  • 247
  • 202