Python keeps overwriting hist on previous plot but doesn't save it with the desired plot

Question

I am saving two separate figures, that each should contain 2 plots together.

The problem is that the first figure is ok, but the second one, does not gets overwritten on the new plot but on the previous one, but in the saved figure, I only find one of the plots :

This is the first figure , and I get the first figure correctly :

import scipy.stats as s
import numpy as np
import os
import pandas as pd
import openpyxl as pyx
import matplotlib
matplotlib.rcParams["backend"] = "TkAgg"
#matplotlib.rcParams['backend'] = "Qt4Agg"
#matplotlib.rcParams['backend'] = "nbAgg"
import matplotlib.pyplot as plt
import math

data = [336256, 620316, 958846, 1007830, 1080401]
pdf = array([ 0.00449982,  0.0045293 ,  0.00455894,  0.02397463,
    0.02395788,  0.02394114])

fig, ax = plt.subplots();
fig = plt.figure(figsize=(40,30))

x = np.linspace(np.min(data), np.max(data), 100); 
plt.plot(x, s.exponweib.pdf(x, *s.exponweib.fit(data, 1, 1, loc=0, scale=2)))
plt.hist(data, bins = np.linspace(data[0], data[-1], 100), normed=True, alpha= 1)
text1= ' Weibull'
plt.savefig(text1+  '.png' )

datar =np.asarray(data)
mu, sigma = datar.mean() , datar.std() # mean and standard deviation

normal_std = np.sqrt(np.log(1 + (sigma/mu)**2))
normal_mean = np.log(mu) - normal_std**2 / 2
hs = np.random.lognormal(normal_mean, normal_std, 1000)
print(hs.max())    # some finite number
print(hs.mean())   # about 136519
print(hs.std())    # about 50405

count, bins, ignored = plt.hist(hs, 100, normed=True)    
x = np.linspace(min(bins), max(bins), 10000)
pdfT = [];
for el in range (len(x)):
    pdfTmp = (math.exp(-(np.log(x[el]) - normal_mean)**2 / (2 * normal_std**2)))
    pdfT += [pdfTmp]


pdf = np.asarray(pdfT)

This is the second set :

fig, ax = plt.subplots();
fig = plt.figure(figsize=(40,40))

plt.plot(x, pdf, linewidth=2, color='r')
plt.hist(data, bins = np.linspace(data[0], data[-1], 100), normed=True, alpha= 1)

text= ' Lognormal '
plt.savefig(text+ '.png' )

The first plot saves the histogram together with curve. instead the second one only saves the curve

update 1 : looking at This Question , I found out that clearing the plot history will help the figures don't mixed up , but still my second set of plots, I mean the lognormal do not save together, I only get the curve and not the histogram.

Hiho · Accepted Answer · 2018-08-02T08:12:01.557

1

This is happening, because you have set normed = True, which means that area under the histogram is normalized to 1. And since your bins are very wide, this means that the actual height of the histogram bars are very small (in this case so small that they are not visible)

If you use

n, bins, _ = plt.hist(data, bins = np.linspace(data[0], data[-1], 100), normed=True, alpha= 1)

n will contain the y-value of your bins and you can confirm this yourself.
Also have a look at the documentation for plt.hist.

So if you set normed to False, the histogram will be visible.

Edit: number of bins

import numpy as np
import matplotlib.pyplot as plt

rand_data = np.random.uniform(0, 1.0, 100)

fig = plt.figure()

ax_1 = fig.add_subplot(211)
ax_1.hist(rand_data, bins=10)

ax_2 = fig.add_subplot(212)
ax_2.hist(rand_data, bins=100)

plt.show()

will give you two plots similar (since its random) to:

which shows how the number of bins changes the histogram. A histogram visualises the distribution of your data along one dimension, so not sure what you mean by number of inputs and bins.

edited Aug 02 '18 at 08:12

answered Aug 02 '18 at 07:38

Hiho

643
4
8

Wow Thanks a lot !! shall I ask you why you wrote it this way : n, bins, _ What is the underline ? – FabioSpaghetti Aug 02 '18 at 07:41
I am also wondering about one point, why is the number of bins higher than the number of my input data which are 5 ? – FabioSpaghetti Aug 02 '18 at 07:43
It was very instructional ! I had forgotten that I am dealing differently with lognormal plot and they are laready normalized – FabioSpaghetti Aug 02 '18 at 07:47
1

plt.hist returns a tuple of 3 values, (n, bins, patches), since patches wasn't relevant I just used _ for its value. And in regards to the number of bins, you set the bins and their edges with the bins argument, i.e. with bins = np.linspace(data[0], data[-1], 100) you are creating 100 bins, with the left edge of the first one corresponding to the smallest value in x and the right of edge of the last one to the largest value in x. – Hiho Aug 02 '18 at 07:50
Thank you so much , so if I decide to have always 5 bins, enough to make it 5 instead of 100 ? cause I thought bins are a direct indicator of number of inputs – FabioSpaghetti Aug 02 '18 at 07:53
Thank you so much for your update, what I meant was : if I have 5 numbers between 1 and 100, should I have 5 beans in this period only ? – FabioSpaghetti Aug 02 '18 at 10:33

Python keeps overwriting hist on previous plot but doesn't save it with the desired plot

1 Answers1

Linked