There is DataFrame.to_sql method, but it works only for mysql, sqlite and oracle databases. I cant pass to this method postgres connection or sqlalchemy engine.
8 Answers
Starting from pandas 0.14 (released end of May 2014), postgresql is supported. The sql
module now uses sqlalchemy
to support different database flavors. You can pass a sqlalchemy engine for a postgresql database (see docs). E.g.:
from sqlalchemy import create_engine
engine = create_engine('postgresql://username:password@localhost:5432/mydatabase')
df.to_sql('table_name', engine)
You are correct that in pandas up to version 0.13.1 postgresql was not supported. If you need to use an older version of pandas, here is a patched version of pandas.io.sql
: https://gist.github.com/jorisvandenbossche/10841234.
I wrote this a time ago, so cannot fully guarantee that it always works, buth the basis should be there). If you put that file in your working directory and import it, then you should be able to do (where con
is a postgresql connection):
import sql # the patched version (file is named sql.py)
sql.write_frame(df, 'table_name', con, flavor='postgresql')
-
1Did this make it to 0.14? – Quant Oct 08 '14 at 19:31
-
Yes, and also 0.15 is already released (release candidate). I will update the answer, thanks for asking. – joris Oct 08 '14 at 19:33
-
1This post solved the problem for me: http://stackoverflow.com/questions/24189150/pandas-writing-dataframe-to-other-postgresql-schema – srodriguex Jan 04 '15 at 00:24
-
2Note: to_sql does not export array types in postgres. – Saurabh Saha Jun 20 '19 at 13:25
-
5Instead of creating a new `Sqlalchemy engine`, can I use an existing `Postgres` connection created using `psycopg2.connect()`? – Underoos Nov 04 '19 at 07:34
-
1For writing tables, that is not possible. It needs to be a sqlalchemy engine or connection. – joris Nov 04 '19 at 11:13
Faster option:
The following code will copy your Pandas DF to postgres DB much faster than df.to_sql method and you won't need any intermediate csv file to store the df.
Create an engine based on your DB specifications.
Create a table in your postgres DB that has equal number of columns as the Dataframe (df).
Data in DF will get inserted in your postgres table.
from sqlalchemy import create_engine
import psycopg2
import io
If you want to replace the table, we can replace it with normal to_sql method using headers from our df and then load the entire big time consuming df into DB.
engine = create_engine(
'postgresql+psycopg2://username:password@host:port/database')
# Drop old table and create new empty table
df.head(0).to_sql('table_name', engine, if_exists='replace',index=False)
conn = engine.raw_connection()
cur = conn.cursor()
output = io.StringIO()
df.to_csv(output, sep='\t', header=False, index=False)
output.seek(0)
contents = output.getvalue()
cur.copy_from(output, 'table_name', null="") # null values become ''
conn.commit()
cur.close()
conn.close()

- 13,721
- 9
- 42
- 58

- 5,848
- 7
- 45
- 69
-
3What does the variable `contents` do? Should this be the one that is written in `copy_from()`? – n1000 Jun 22 '18 at 21:22
-
@n1000 Yeah just ignore the `contents` variable, everything else should work just fine – Bobby Jul 12 '18 at 20:34
-
-
5
-
19
-
-
1Load is table is failing for me because of new line characters in some fields. How do I handle this? df.to_csv(output, sep='\t', header=False, index=False, encoding='utf-8') cur.copy_from(output, 'messages', null="") # null values become '' – conetfun Mar 06 '19 at 21:11
-
1What do you do when contents = output.getvalue() throws a memory error??? – workin 4weekend Mar 12 '20 at 17:42
-
1@moshevi - The seek method sets the File's current position, so here it is explicitly moving the position back to the beginning of the file, byte 0 https://kite.com/python/docs/StringIO.StringIO.seek – foakesm Jun 09 '20 at 08:40
-
4If you want to use schema, you can add `schema=your_schema` parameter in the `to_sql` part of the code. – Jonas Palačionis Sep 29 '20 at 09:34
-
how to specify schema? I got error in python saying that `no schema has been selected to create in` – Osca Jan 13 '21 at 00:43
-
3For writing the data to a specific schema, since version 2.9 of `psycopg`, you cannot use the `cur.copy_from` method anymore: Changed in version 2.9: the table and fields names are now quoted. If you need to specify a schema-qualified table please use `copy_expert()`. Here is what is looks like with copy_expert: `cur.copy_expert('COPY schema_name.table_name FROM STDIN', output)` – Alexandre Léonard Sep 01 '21 at 19:26
-
9
-
Load into the PG table is failing because of new line characters in some fields. – Robin Apr 14 '22 at 18:35
-
Can we edit the original response cur.copy_from fails. @AlexandreLéonard has the updated call. The original response used to work and no longer does with current psycopg versions. – basswaves May 19 '22 at 02:29
-
2Test result: 27 seconds for `.to_sql` (the same for importing csv via DataGrip), 8 seconds for this method. shape = (1,000,000, 3) – Shimon S Jun 27 '22 at 19:05
-
If the primary key is an autogenerated ID this doesn't work as it expects the id to be in the df. Any advice on how to circumvent this limitation? I really need the speed – tnfru Feb 22 '23 at 14:32
-
If the whole CSV doesn't fit in memory, how would you do this in batches? – Alex Lenail Apr 30 '23 at 01:29
-
-
Pandas 0.24.0+ solution
In Pandas 0.24.0 a new feature was introduced specifically designed for fast writes to Postgres. You can learn more about it here: https://pandas.pydata.org/pandas-docs/stable/user_guide/io.html#io-sql-method
import csv
from io import StringIO
from sqlalchemy import create_engine
def psql_insert_copy(table, conn, keys, data_iter):
# gets a DBAPI connection that can provide a cursor
dbapi_conn = conn.connection
with dbapi_conn.cursor() as cur:
s_buf = StringIO()
writer = csv.writer(s_buf)
writer.writerows(data_iter)
s_buf.seek(0)
columns = ', '.join('"{}"'.format(k) for k in keys)
if table.schema:
table_name = '{}.{}'.format(table.schema, table.name)
else:
table_name = table.name
sql = 'COPY {} ({}) FROM STDIN WITH CSV'.format(
table_name, columns)
cur.copy_expert(sql=sql, file=s_buf)
engine = create_engine('postgresql://myusername:mypassword@myhost:5432/mydatabase')
df.to_sql('table_name', engine, method=psql_insert_copy)

- 14,558
- 15
- 79
- 103
-
4For most of the time, add `method='multi'` option is fast enough. But yes, this `COPY` method is the fastest way right now. – ssword May 09 '19 at 15:26
-
Is this for csv's only? Can it be used with .xlsx as well? Some notes on what each part of this is doing would be helpful. The first part after the `with` is writing to an in memory buffer. The last part of the `with` is using an SQL statement and taking advantage of copy_expert's speed to bulk load the data. What is the middle part that starts with `columns =` doing? – DudeWah Jul 31 '19 at 17:13
-
This worked very well for me. And could you explain the `keys` arguments in the `psql_insert_copy` function please? How does it get any keys and are the keys just the column names? – Bowen Liu Feb 20 '20 at 03:24
-
I've tried using this method, however it throws me an error: `Table 'XYZ' already exists`. As far as I understand, it shouldn't create a table, should it? – E. Epstein Apr 01 '20 at 09:00
-
1@E.Epstein - you can modify the last line to `df.to_sql('table_name', engine, if_exists='replace', method=psql_insert_copy)` - this does create a table in your database. – mgoldwasser Apr 01 '20 at 16:58
-
@mgoldwasser - I tried what you said with `"append"` instead of `"replace"`, and it throws an error similar to the last: `relation 'XYZ' not found`. – E. Epstein Apr 02 '20 at 21:25
-
That's indeed very fast, I still had to use a chunk size for my large dataset (~200k rows) to make this work (it took 6.5s with a `chunksize=10000`). Any idea why they haven't added that function in pandas and only provided it as an example? – cglacet Sep 30 '20 at 16:17
-
-
You might to add" " around the table name code otherwise it doesn't work with dashes in the table name `sql = 'COPY "{}" ({}) FROM STDIN WITH CSV'.format(table_name, columns)` – Danferno Oct 13 '21 at 14:17
This is how I did it.
It may be faster because it is using execute_batch
:
# df is the dataframe
if len(df) > 0:
df_columns = list(df)
# create (col1,col2,...)
columns = ",".join(df_columns)
# create VALUES('%s', '%s",...) one '%s' per column
values = "VALUES({})".format(",".join(["%s" for _ in df_columns]))
#create INSERT INTO table (columns) VALUES('%s',...)
insert_stmt = "INSERT INTO {} ({}) {}".format(table,columns,values)
cur = conn.cursor()
psycopg2.extras.execute_batch(cur, insert_stmt, df.values)
conn.commit()
cur.close()

- 4,708
- 8
- 42
- 85

- 541
- 4
- 8
-
2I get AttributeError: module 'psycopg2' has no attribute 'extras'. Ah, this needs to be explicitly imported. import psycopg2.extras – GeorgeLPerkins Apr 22 '19 at 15:12
-
-
This doesn't seem to handle np.nan properly. If you use the code above, you will be likely to see 'NaN' strings instead of Nulls in the database. – Jeong Kim Aug 14 '22 at 22:52
Faster way to write a df to a table in a custom schema with/without index:
"""
Faster way to write df to table.
Slower way is to use df.to_sql()
"""
from io import StringIO
from pandas import DataFrame
from sqlalchemy.engine.base import Engine
class WriteDfToTableWithIndexMixin:
@classmethod
def write_df_to_table_with_index(
cls,
df: DataFrame,
table_name: str,
schema_name: str,
engine: Engine
):
"""
Truncate existing table and load df into table.
Keep each column as string to avoid datatype conflicts.
"""
df.head(0).to_sql(table_name, engine, if_exists='replace',
schema=schema_name, index=True, index_label='id')
conn = engine.raw_connection()
cur = conn.cursor()
output = StringIO()
df.to_csv(output, sep='\t', header=False,
index=True, index_label='id')
output.seek(0)
contents = output.getvalue()
cur.copy_expert(f"COPY {schema_name}.{table_name} FROM STDIN", output)
conn.commit()
class WriteDfToTableWithoutIndexMixin:
@classmethod
def write_df_to_table_without_index(
cls,
df: DataFrame,
table_name: str,
schema_name: str,
engine: Engine
):
"""
Truncate existing table and load df into table.
Keep each column as string to avoid datatype conflicts.
"""
df.head(0).to_sql(table_name, engine, if_exists='replace',
schema=schema_name, index=False)
conn = engine.raw_connection()
cur = conn.cursor()
output = StringIO()
df.to_csv(output, sep='\t', header=False, index=False)
output.seek(0)
contents = output.getvalue()
cur.copy_expert(f"COPY {schema_name}.{table_name} FROM STDIN", output)
conn.commit()
If you have JSON values in a column in your df then above method will still load all data correctly but the json column will have some weird format. So converting that json column to ::json
may generate error. You have to use to_sql()
. Add method=multi
to speed things up and add chunksize
to prevent your machine from freezing:
df.to_sql(table_name, engine, if_exists='replace', schema=schema_name, index=False, method='multi', chunksize=1000)

- 5,848
- 7
- 45
- 69
using psycopg2 you can use native sql commands to write data into a postgres table.
import psycopg2
import pandas as pd
conn = psycopg2.connect("dbname='{db}' user='{user}' host='{host}' port='{port}' password='{passwd}'".format(
user=pg_user,
passwd=pg_pass,
host=pg_host,
port=pg_port,
db=pg_db))
cur = conn.cursor()
def insertIntoTable(df, table):
"""
Using cursor.executemany() to insert the dataframe
"""
# Create a list of tupples from the dataframe values
tuples = list(set([tuple(x) for x in df.to_numpy()]))
# Comma-separated dataframe columns
cols = ','.join(list(df.columns))
# SQL query to execute
query = "INSERT INTO %s(%s) VALUES(%%s,%%s,%%s,%%s)" % (
table, cols)
try:
cur.executemany(query, tuples)
conn.commit()
except (Exception, psycopg2.DatabaseError) as error:
print("Error: %s" % error)
conn.rollback()
return 1

- 86
- 7
-
5A good answer will always include an explanation why this would solve the issue, so that the OP and any future readers can learn from it. – Tyler2P Dec 19 '21 at 10:29
For Python 2.7 and Pandas 0.24.2 and using Psycopg2
Psycopg2 Connection Module
def dbConnect (db_parm, username_parm, host_parm, pw_parm):
# Parse in connection information
credentials = {'host': host_parm, 'database': db_parm, 'user': username_parm, 'password': pw_parm}
conn = psycopg2.connect(**credentials)
conn.autocommit = True # auto-commit each entry to the database
conn.cursor_factory = RealDictCursor
cur = conn.cursor()
print ("Connected Successfully to DB: " + str(db_parm) + "@" + str(host_parm))
return conn, cur
Connect to the database
conn, cur = dbConnect(databaseName, dbUser, dbHost, dbPwd)
Assuming dataframe to be present already as df
output = io.BytesIO() # For Python3 use StringIO
df.to_csv(output, sep='\t', header=True, index=False)
output.seek(0) # Required for rewinding the String object
copy_query = "COPY mem_info FROM STDOUT csv DELIMITER '\t' NULL '' ESCAPE '\\' HEADER " # Replace your table name in place of mem_info
cur.copy_expert(copy_query, output)
conn.commit()

- 41
- 2
Create engine (where dialect='postgres' or 'mysql', etc..):
from sqlalchemy import create_engine
engine = create_engine(f'{dialect}://{user_name}@{host}:{port}/{db_name}')
Session = sessionmaker(bind=engine)
with Session() as session:
df = pd.read_csv(path + f'/{file}')
df.to_sql('table_name', con=engine, if_exists='append',index=False)

- 3,373
- 3
- 36
- 58

- 47
- 2
- 11
-
It works on most database including postgres. You have to specify the dialect in the engine = create_engine(dialect='postgres', etc....) – David Jun 16 '21 at 14:48