3

This may be obvious, but I can't do it. I'm new to Python and recently starting on matplotlib so I can't see the problem.

I am doing the following:

  • create a pandas.DataFrame
  • make a histogram and save as a png file
  • create a new column of the DataFrame
  • make a histogram of that columns and save as a new png file

What I get is two png files with the same figure: the DataFrame histogram. (I remember similar problems on MATLAB and it took time to me to find the way)

Here is the code:

import os
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt

# Suppose 'housing' is a pandas.DataFrama with shape (20640, 11)

# Make a histogram of each column of housing data frame
housing.hist(bins=50, figsize=(20, 15))

# Save histogram as a file
os.makedirs('im', exist_ok=True)
plt.savefig('im/housing_hist.png')

# Create a new attribute which represent income category
housing["income_cat"] = pd.cut(housing["median_income"],
                               bins=[0., 1.5, 3.0, 4.5, 6., np.inf],
                               labels=[1, 2, 3, 4, 5])

# Create a histogram of income_cat
housing["income_cat"].hist()
plt.savefig('im/income_cat_hist.png')

I need help to save different files.

Thanks for your time.

ClimateUnboxed
  • 7,106
  • 3
  • 41
  • 86

2 Answers2

2

It is more reliably to save the figure from the figure object. In Python (and MATLAB in more recent versions), figures are a particular data type. The pandas hist function returns an axes or array of axes.

If you are making a single axes, you can get the figure using the figure property, and then call savefig from that.

So something like this should work.

ax1 = housing.hist(bins=50, figsize=(20, 15))
ax1.figure.savefig('im/housing_hist.png')

If you are making multiple axes, you would get a numpy array axes, which you can just flatten and get the first element of:

axs1 = housing.hist(bins=50, figsize=(20, 15))
axs1.ravel()[0].figure.savefig('im/housing_hist.png')

Edit: To make it clear, for the second figure you should do:

ax2 = housing["income_cat"].hist()
ax2.figure.savefig('im/income_cat_hist.png')
TheBlackCat
  • 9,791
  • 3
  • 24
  • 31
  • Thanks, it helps me to get a better understand of those objects, but it didn't help on saving two different figures. – Jorge Espinoza Mar 06 '20 at 07:32
  • I got an error using ````ax1.figure.savefig('...')```` due to ax1 is an array of histograms: ````AttributeError: 'numpy.ndarray' object has no attribute 'figure' ```` – Jorge Espinoza Mar 10 '20 at 05:19
  • @JorgeE I explained how to deal with that in my second example. – TheBlackCat Mar 11 '20 at 19:20
  • ok, I understood, but I need to save the whole set of histograms, like ````plt.savefig```` does, not just one element. Is there a way to do it using one line? Thanks! – Jorge Espinoza Mar 11 '20 at 19:32
  • 1
    @JorgeE Yes, that is what the second example will do. You didn't you try it, did you? All the histograms are part of a single figure. What the second example does is get the figure associated with the first histogram, and save that. But since all the other histograms are also part of that same figure, it will save all of them. So it doesn't actually matter which histogram you use, it is just that the first one will always be present so it is the obvious one to pick. Please try the answers you are given before concluding they don't work. – TheBlackCat Mar 21 '20 at 03:19
  • I really tried it carefully and exactly how you did (before my answers) but then I got just one histogram. Now I try again and I got all the histograms in the same figure, like I needed. I don't know why now and not before, I don't remember any difference. Sorry for the failure and thank you very much for your time, patience and correct answer. – Jorge Espinoza Mar 25 '20 at 02:15
0

Well, I think the solution is adding a plt.clf() after each plt.savefig('...'). I saw this post to get it:

matplotlib.pyplot will not forget previous plots - how can I flush/refresh?

I would appreciate a better answer than mine.