I have a problem processing the results we got in a Physics lab. We have a list of floats, measured in seconds and we need to plot the best fitting Gaussian distribution. On the X-axis we should have intervals in seconds while Y-axis should show probability density. How can I make Python know in which interval to count each measurement from the list when counting probability density? I already counted probability density for the borders of the intervals (using the formula for the normal distribution) and the maximum value of the density as my professor told me. We have also plotted histograms with measurements in seconds and in probability density (using plt.hist and setting density=True). But now we're stuck. Sorry, if it may sound simple, I've been learning Python for two months only. Here is my code (sorry for some lines in Russian):
import math
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats
def trunc(n, decimals=0):
""" Rounds your number n up to a given digit. Zero figures after dot on default"""
multiplier = 10 ** decimals
return int(n * multiplier) / multiplier
# We get N which is the number of measurements.
N = int(input())
# We collect the data as input. N measurements (type=float) in total.
t = [float(input()) for i in range(N)]
# Now we want to find the minimum and maximum values, the average time and the sum of all measurements.
sum_t = sum(t)
avr_t = trunc(sum_t / N, 4)
t_max = max(t)
t_min = min(t)
print('Min time:', t_min, 'Max time: ', t_max)
print('Measurements sum:', sum_t)
print('Average time:', avr_t)
# Now we are calculating the deviation from the mean for all measurements and the variance.
# We are also calculating the sum of deviations to check the accuracy of the calculations we've done.
sum_dev_t = 0
sum_sqrD = 0
for i in range(N):
dev_t = trunc(t[i] - avr_t, 4)
sum_dev_t = trunc(sum_dev_t + dev_t, 3)
sqrD = trunc(dev_t ** 2, 3)
sum_sqrD = sum_sqrD + sqrD
print('Measurement number:', i + 1, 'Measured value:', t[i], 't{i} - <t> :', trunc(dev_t, 3),
'(t{i} - <t>)^2 :', sqrD)
print('Control sum:', sum_dev_t)
# We calculate the distance between the largest and the smallest measurement and divide it into intervals.
total_t = trunc(t_max - t_min, 2)
inter = trunc(total_t / math.floor(N ** 0.5), 6)
# We calculate how many measurements there are in each interval.
dN = [0 for i in range(N)]
ranges = []
probdens = []
in_min = t_min
for i in range(math.floor(N ** 0.5)):
ranges.append(in_min)
for j in range(N):
if in_min <= t[j] < trunc(in_min + inter, 6):
dN[i] = dN[i] + 1
in_min = trunc(in_min + inter, 6)
print('There are', dN[i], 'measurements in the range [', trunc(in_min - inter, 2), ';', trunc(in_min, 2), ').\
Probability is ', dN[i]/N, 'Probability density is ', trunc(dN[i]/(N * inter), 2))
probdens.append(trunc(dN[i]/(N * inter), 2))
# We calculate sigma and the maximum density probability (r_max) using Gaussian distribution formula.
sigma = trunc((sum_sqrD / N) ** 0.5, 3)
print('STD is', sigma)
r_max = trunc(0.4 / sigma, 3)
print('Max probability density: ', r_max)
# Now we are calculating the probability density for Gaussian distribution for the interval borders.
in_min = t_min
a = 1 / (sigma * (2 * math.pi) ** 0.5)
for i in range(1 + math.floor(N ** 0.5)):
r = a * math.exp(-((in_min - avr_t) ** 2) / (2*sigma ** 2))
print('Probability density value at t=', in_min, 'is', trunc(r, 3))
in_min = trunc(in_min + inter, 3)
ranges.append(t_max)
# We are plotting the histogram (intervals in secs on X, number of measurements on Y)
plt.hist(t, bins=ranges)
plt.title('Histogram of measurement counts')
plt.xlabel('Measured time (seconds)')
plt.ylabel('Measurements number')
plt.show()
def gauss(x, a, avr_t, sigma):
return a * math.exp(-(x - avr_t) ** 2 / (2 * sigma ** 2))
# We are plotting the probability density histogram (intervals in secs on X, density probability on Y)
# And trying to plot Gaussian distribution fit.
plt.hist(t, bins=ranges, density=True)
x = np.linspace(avr_t - 5*sigma, avr_t + 5*sigma, 1000)
plt.plot(x, stats.norm.pdf(x, avr_t, sigma))
plt.title('Probability density histogram')
plt.xlabel('Measured time, seconds')
plt.ylabel('Probability density')
plt.show()
# We want to see how many measurements there are 1, 2, 3 square deviations away from the mean.
in_inter = [0 for i in range(4)]
for i in range(1, 4):
for j in range(N):
if avr_t - i * sigma <= t[j] <= avr_t + i*sigma:
in_inter[i - 1] = in_inter[i - 1] + 1
print('There are', in_inter[i - 1], f'measurements in the range <t> +- {i}sigma:', (avr_t - i * sigma, avr_t + i * sigma),
'. Relative frequency is ', in_inter[i - 1] / N)
# Let's calculate sigma_<t>:
sigma_avr_t = sigma / ((N - 1) ** 0.5)
print('Average STD is ', sigma_avr_t)