If you pass an S3 path (anything starting with s3://
) to pandas's to_csv
method, it will save the dataframe directly to S3. This does not work with to_hdf
.
Do I have to use boto3 to save the file on S3, or can I do it directly with Pandas?
Pandas documentation is asymmetrical in that respect in the sense that read_hdf
allow to specify an S3 url, while to_hdf5
does not. My personnal impression is that it is because to_hdf5
has an append
mode (a) and S3 does not support append operations.
So to answer your question more specifically, yes, you wold have to use boto3
to export your file to your bucket once it has been created. See here for some strategies.