7

I would appreciate any insight with the following.

I want to plot two datasets on one common histogram such that both histograms do not have their tops cut-off and have probability distributions ranging from 0 to 1.

Let me explain what I mean. So far, I can plot the two datasets on one histogram nicely and force the integral of both distributions to be 1 by writing normed = 1 in ax.hist(), as seen in the following figure: enter image description here

and which is produced from code like this:

        x1, w1, patches1 = ax.hist(thing1, bins=300, edgecolor='b', color='b', histtype='stepfilled', alpha=0.2, normed = 1)

        x2, w2, patches2 = ax.hist(thing2, bins=300, edgecolor='g', color='g', histtype='stepfilled', alpha=0.2, normed = 1)             

In the general case, one probability distribution is much higher than the other and it makes it hard to read the plot clearly.

So, I've tried to normalise both such that they would both range from 0 to 1 on the y axis and still preserve their shape. For example, I've tried the following code:

for item in patches1:
    item.set_height(item.get_height()/sum(x1))

which is taken from the discussion here How to normalize a histogram in python?, but python throws me an error message saying there is no such quality as get_height.

My question is simply: How can I have it that so that the y axis ranges from 0 to 1 and preserve the shape of both distributions?

inquiries
  • 385
  • 2
  • 7
  • 20

2 Answers2

11

I would recommend to pre-compute the histograms using numpy and then plot them in matplotlib using bar. The histogram can then simply be normalized (by amplitude) by dividing by the maximum amplitude of each histogram. Note that, in order to get any kind of meaningful comparison between the two histograms, it is best to use the same bins for both of them. Below an example how to do this:

from matplotlib import pyplot as plt
import numpy as np

##some random distribution
dist1 = np.random.normal(0.5, 0.25, 1000)
dist2 = np.random.normal(0.8, 0.1, 1000)

##computing the bin properties (same for both distributions)
num_bin = 50
bin_lims = np.linspace(0,1,num_bin+1)
bin_centers = 0.5*(bin_lims[:-1]+bin_lims[1:])
bin_widths = bin_lims[1:]-bin_lims[:-1]

##computing the histograms
hist1, _ = np.histogram(dist1, bins=bin_lims)
hist2, _ = np.histogram(dist2, bins=bin_lims)

##normalizing
hist1b = hist1/np.max(hist1)
hist2b = hist2/np.max(hist2)

fig, (ax1,ax2) = plt.subplots(nrows = 1, ncols = 2)

ax1.bar(bin_centers, hist1, width = bin_widths, align = 'center')
ax1.bar(bin_centers, hist2, width = bin_widths, align = 'center', alpha = 0.5)
ax1.set_title('original')

ax2.bar(bin_centers, hist1b, width = bin_widths, align = 'center')
ax2.bar(bin_centers, hist2b, width = bin_widths, align = 'center', alpha = 0.5)
ax2.set_title('ampllitude-normalized')

plt.show()

And a picture of how this looks like:

enter image description here

Hope this helps.

Thomas Kühn
  • 9,412
  • 3
  • 47
  • 63
  • Thank you for this suggestion, Thomas. I will try this in the next couple days and let you know how it goes. – inquiries Dec 28 '17 at 16:17
  • Neat! That said, there has to be a built in way to do this with seaborn/matplotlib.. This seems a common scenario...? – GrimSqueaker Nov 02 '20 at 07:59
  • @GrimSqueaker Probably there is. On the other hand, I'm not so certain how useful it is to compare two amplitude-normalised histograms. Looking back at this, maybe it would have been more intuitive to use `twinx()` here and leave the histograms un-scaled. – Thomas Kühn Nov 03 '20 at 06:30
3

I've tried to normalise both such that they would both range from 0 to 1 on the y axis and still preserve their shape.

This method won't get your plots on a 0 to 1 scale, but it will get them on the same scale relative to each other:

Just set the parameter in your plt.hist() function call to density=True like this:

plt.hist([array_1, array2], density=True)

This will plot both your distributions on the same scale such that the area under the curve of each sums to 1.

JohnnyUtah
  • 350
  • 4
  • 11