1

I am trying to save matplotlib to my S3 bucket on AWS. I use the savefig() function like so:

import matplotlib.pyplot as plt

f = plt.figure()
plt.plot(some figure)
f.savefig("s3://bucketpath/foo.pdf", bbox_inches='tight')

But I get the path not found error. If I don't specify the path, it seems to work fine, but I don't know where it is saved.

I am running my code (in pyspark) using sagemaker jupyterlab and therefore running on one of the EC2 instances. Is there a way to specify the path to save the pdf to like how one would use the write() function when saving dataframes to an S3 bucket?

I came across this post on this site, but it is for uploading from your local client to S3 on the cloud using boto. Is there a way to save it directly to S3 without using aws access keys etc?

thentangler
  • 1,048
  • 2
  • 12
  • 38

1 Answers1

2

I was having a similar problem on a Jupyter Notebook running on AWS EMR, while trying to save another binary file format (png) to S3. I solved the issue by interfacing with S3 using the s3fs library.

Using your example, it should look something like this:

import io

import matplotlib.pyplot as plt
import s3fs

plt.plot(some figure)

img_data = io.BytesIO()
plt.savefig(img_data, format='pdf', bbox_inches='tight')
img_data.seek(0)

s3 = s3fs.S3FileSystem(anon=False)  # Uses default credentials
with s3.open('s3://bucketpath/foo.pdf', 'wb') as f:
    f.write(img_data.getbuffer())

I've noticed you were working on Sagemaker JupyterLab, but looking at s3fs docs, I believe it would work as well.

My solution was based on the answer you mentioned in your question and s3fs documentation

PMHM
  • 173
  • 1
  • 3
  • 12