Write pandas dataframe to S3 using HDF5 format

Question

If you pass an S3 path (anything starting with s3://) to pandas's to_csv method, it will save the dataframe directly to S3. This does not work with to_hdf.

Do I have to use boto3 to save the file on S3, or can I do it directly with Pandas?

score 2 · Answer 1 · answered Oct 24 '19 at 16:10

Pandas documentation is asymmetrical in that respect in the sense that read_hdf allow to specify an S3 url, while to_hdf5 does not. My personnal impression is that it is because to_hdf5 has an append mode (a) and S3 does not support append operations. So to answer your question more specifically, yes, you wold have to use boto3 to export your file to your bucket once it has been created. See here for some strategies.

Write pandas dataframe to S3 using HDF5 format

1 Answers1