I am trying to save dataframes to parquet and then load them into redshift. For that i do the following:
parquet_buffer = BytesIO()
df.to_parquet(parquet_buffer,index=False,compression='gzip')
s3.Bucket(write_bucket).put_object(Key=write_path,Body=parquet_buffer.getvalue())
I then load the saved file directly into redshift using the "COPY" command:
COPY table_name
from write_path
iam_role my_iam_role
FORMAT AS PARQUET
It leads the following error:
write path: has an incompatible Parquet schema for column ...
If I apply the same procedure with .csv It works just fine. What causes the problem when switching to parquet?