I tried looking this up on other users' questions, but I don't think I have found an answer.
I am attempting to plot a histogram from some data I have stored in a Pandas dataframe, and I want the y-axis value of each bin to equal the probability of that bin's event occurring. Since the density=True
argument of matplotlib.pyplot.hist
divides the counts in a bin by total counts and by the bin size, for bins of size =/= 1, the y-axis value of the histogram doesn't equal the probability of the event happening in that bin. It instead equals the probability in that bin per unit in that bin. I wish to make my bins 10 units wide, which has lead to my issue.
My code to generate a Pandas dataframe with data similar to what I'm working with:
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
from random import seed
from random import randint
data = pd.DataFrame(columns=['Col1'])
i = 0
while i < 49500:
data.loc[len(data.index)] = [0]
i += 1
seed(1)
j = 0
while j < 500:
data.loc[len(data.index)] = [randint(1,500)]
j += 1
My code to plot the histogram:
plt.figure(2)
fig2, ax2 = plt.subplots()
ax2.hist(data['Col1'], range=(0.0, 500.0), bins=50, label='50000 numbers\n in 10 unit bins', density=True)
plt.title('Probability Density of Some Numbers from 0 to 500', wrap=True)
plt.legend(loc='upper right')
plt.yscale('log')
plt.xticks()
plt.minorticks_on()
plt.ylabel('Probability')
plt.xlabel('Number')
plt.savefig('randnum.png')
My histogram (note the 0-10 bin, while composing roughly 99% of the data, is only at a probability of 0.1):
I do realize that by making the y-axis probability not inversely proportional to bin size, the integral of the histogram no longer equals to 1 (it will equal to 10 in my case), but this is precisely what I am seeking.
Is there a way to either 1) change the value the histogram is normalized to or 2) directly multiply y-values of the histogram by a value of my choosing?