I have a Redshift server, which I initiate via psycopg2 (note that ODBC is not supported on the company server so I cant use pyodbc).
Currently it is taken over 10 minutes for 30-35k rows via pd.to_sql()
, which writes from dataframe into the Redshift DB. So as a work-around I download DF as csv, push file to S3, and then use copy
to write into the DB.
The fast_executemany
solution as per Speeding up pandas.DataFrame.to_sql with fast_executemany of pyODBC would have been perfect- however this is not supported in psycopg2
.
I also found d6tstack
as per https://github.com/d6t/d6tstack/blob/master/examples-sql.ipynb but pd_to_psql
doesn't work for Redshift, only Postgresql (can't copy... from stdin
)
Any alternatives I can use for my case?
This is my code:
import sqlalchemy as sa
DATABASE = ""
USER = ""
PASSWORD = ""
HOST = "...us-east-1.redshift.amazonaws.com"
PORT = "5439"
SCHEMA = "public"
server = "redshift+psycopg2://%s:%s@%s:%s/%s" % (USER,PASSWORD,HOST,str(PORT),DATABASE)
engine = sa.create_engine(server)
conn = engine.raw_connection()
with conn.cursor() as cur:
cur.execute('truncate table_name')
df.to_sql('table_name', engine, index=False, if_exists='append')