I am experiencing slow performance when using Pandas to load data from an Excel file into an existing Redshift table. The Excel file has 10+ columns and 20000+ rows, and the operation is taking over 7 hours to complete. Is there a way to optimize the code and increase performance? Note that I do not have access to S3 and cannot use it as an option.
# Establish a connection to Redshift
conn = psycopg2.connect(
host='your-redshift-host',
port='your-redshift-port',
user='your-username',
password='your-password',
dbname='your-database'
)
# Copy the contents of excel_file to a dataframe
df = pd.read_excel(excel_file)
# Insert dataframe records into the table
cur = conn.cursor()
cur.execute('TRUNCATE TABLE table_name')
insert_query = "INSERT INTO table_name (tablecolumn1, tablecolumn2, tablecolumn3) VALUES (%s, %s, %s)"
for index, row in df.iterrows():
cur.execute(insert_query, (row['dfcolumn1'], row['dfcolumn2'], row['dfcolumn3']))
conn.commit()
cur.close()