Suppose we have a function (like pdf of a normal distribution), and we want to approximate it with histograms under the function. I wanna specify the number of bins and draw histograms under the curve. How is it possible to do in Python? For example, a graph like below, but all spikes are under the curve, and the number of bins is a parameter.
Asked
Active
Viewed 194 times
0
-
1What have you tried? SO is not a coding service. Please read the following documentation, then [edit], and rephrase the question. [Take the Tour](https://stackoverflow.com/tour), [How to ask a good question](https://stackoverflow.com/help/how-to-ask), & [On Topic](https://stackoverflow.com/help/on-topic). Always provide a [mre] with **code, data, errors, current & expected output, as [formatted text](https://stackoverflow.com/help/formatting)** & **you're expected to [try to solve the problem first](https://meta.stackoverflow.com/questions/261592) and show your effort**. – Trenton McKinney Dec 31 '21 at 19:35
-
Thank you JohanC. That is exactly what I am looking for. – Amin Jan 01 '22 at 06:08
1 Answers
2
You can use the pdf
to decide the heights of the bars:
from scipy.stats import norm
import numpy as np
N = 20
x = np.linspace(norm.ppf([0.001, 0.999]), N)
y = norm.pdf(x)
Each center of a bar will be just as high as the pdf
, so the bars will cut the curve. To only touch the curve, one could calculate the pdf
at the lowest point, being x + width/2
for positive points. As the pdf
is symmetric, abs
can be used to create a single expression for both positive and negative x-values.
Here is an animation created via the celluloid library.
import matplotlib.pyplot as plt
import numpy as np
import scipy
from scipy.stats import norm
from celluloid import Camera
fig, ax = plt.subplots(figsize=(8, 2))
fig.subplots_adjust(bottom=0.15, left=0.1, right=0.97)
ax.spines['right'].set_visible(False)
ax.spines['top'].set_visible(False)
ax.spines['bottom'].set_visible(False)
camera = Camera(fig)
x0, x1 = norm.ppf([0.001, 0.999])
x_pdf = np.linspace(x0, x1, 1000)
y_pdf = norm.pdf(x_pdf)
for N in range(10, 80):
ax.plot(x_pdf, y_pdf, 'r', lw=2)
x_bar = np.linspace(x0, x1, N)
width = x_bar[1]-x_bar[0]
y_bar = norm.pdf(np.abs(x_bar) + width/2)
ax.bar(x_bar, y_bar, width=width, fc='DeepSkyBlue', ec='k')
ax.margins(x=0)
ax.set_ylabel('probability density')
camera.snap()
animation = camera.animate(interval=600)
animation.save('gaussian_histogram.gif')
plt.show()
PS: Here is a list of related questions (collected by @TrentonMcKinney), where you can find additional explanation and ideas:

JohanC
- 71,591
- 8
- 33
- 66