2

I will try to phrase this the best I can. I want to write/append to an xls/xlsx file from Sagemaker that is in an S3 bucket. There is an empty excel sheet for each file type (csv,xls,xlsx) in the S3 bucket that I upload prior to. I am able to write/append a df to the empty csv file in the S3 bucket no problem. But it does not work for xls/xlsx. Here is the code I am using for the csv:

df.to_csv('s3://bucket_name/temp/Database.csv', index=False, mode = 'w', header = False)

Here is the code I am using for the xlsx file:

with pd.ExcelWriter('s3://bucket_name/project/Database.xlsx', mode = 'w', engine="xlsxwriter") as writer:
    df.to_excel(writer, "Sheet 1")  
    writer.save()

Note: For xls, I just change the engine to openpyxl and change the file path to the xls one

I get this from the above code when running for xlsx/xls:

FileCreateError: [Errno 2] No such file or directory: 's3://bucket_name/project/Database.xlsx'

Even though it is in the exact same location as the other one. I am not sure what the problem is but I have not found any solution. I have tried adding 'r' to make it a raw string, I have tried changing the slashes around, but nothing seems to work. Does anyone that has experience with xlsxwriter/openpyxl know what the problem could be?

coderguy
  • 37
  • 6

1 Answers1

1

pd.to_csv can handle s3 paths since v0.20.0, but pd.ExcelWriter can't.

You'll need to s3fs or boto like explained here: Store Excel file exported from Pandas in AWS.

RobinFrcd
  • 4,439
  • 4
  • 25
  • 49
  • So I am trying the boto method from the top answer from that link, but I can't get it to work exactly how I want. If I do `filepath = 's3://bucket_name/project/Database.xls'` it doesn't work, no error though. But if I do `filepath = 'Database.xls'` it works but it places the file at s3://bucket_name/Database.xls not at bucket_nam/project/Database.xls I also have `bucket ='bucket_name'` – coderguy May 15 '22 at 21:50
  • If your S3 URI is `s3://bucket_name/project/Database.xls`, the bucket will be `bucket_name` and the key `project/Database.xls`. – RobinFrcd May 15 '22 at 21:54
  • Thanks, that seems to work. But if I want to append to the xls (can't append with xlsxwriter, using openpyxl) file after writing to it, it won't work. I get `BadZipFile: File is not a zip file` – coderguy May 15 '22 at 22:01
  • Sorry, I mean xlsx above, not xls – coderguy May 15 '22 at 22:10
  • I'm afraid we're getting out of the scope of the question here, it'll be easier if you create a new question with a [mwe](https://stackoverflow.com/help/minimal-reproducible-example) – RobinFrcd May 15 '22 at 22:18