12

I'm writing Python code on Databricks to process some data and output graphs. I want to be able to save these graphs as a picture file (.png or something, the format doesn't really matter) to DBFS.

Code:

import pandas as pd
import matplotlib.pyplot as plt

df = pd.DataFrame({'fruits':['apple','banana'], 'count': [1,2]})
plt.close()
df.set_index('fruits',inplace = True)
df.plot.bar()
# plt.show()

Things that I tried:

plt.savefig("/FileStore/my-file.png")

[Errno 2] No such file or directory: '/FileStore/my-file.png'

fig = plt.gcf()
dbutils.fs.put("/dbfs/FileStore/my-file.png", fig)

TypeError: has the wrong type - (,) is expected.

After some research, I think the fs.put only works if you want to save text files.

running the above code with plt.show() will get you a bar graph - I want to be able to save the bar graph as an image to DBFS. Any help is appreciated, thanks in advance!

Cizzl
  • 324
  • 2
  • 11
KikiNeko
  • 261
  • 1
  • 3
  • 7

2 Answers2

5

Easier way, just with matplotlib.pyplot. Fix the dbfs path:

Example

import matplotlib.pyplot as plt
plt.scatter(x=[1,2,3], y=[2,4,3])
plt.savefig('/dbfs/FileStore/figure.png')
mangelfdz
  • 306
  • 2
  • 7
  • this will do it, though databricks says we should use the dbutils.fs.put function: https://docs.databricks.com/dbfs/filestore.html – jimh Jun 06 '23 at 18:24
3

You can do this by saving the figure to memory and then using the Python local file APIs to write to the DataBricks filesystem (DBFS).

Example:

import matplotlib.pyplot as plt
from io import BytesIO

# Create a plt or fig, then:
buf = BytesIO()
plt.savefig(buf, format='png')

path = '/dbfs/databricks/path/to/file.png'

# Make sure to open the file in bytes mode
with open(path, 'wb') as f:
  # You can also use Bytes.IO.seek(0) then BytesIO.read()
  f.write(buf.getvalue())
Alex Ross
  • 3,729
  • 3
  • 26
  • 26