0

I have a problem processing the results we got in a Physics lab. We have a list of floats, measured in seconds and we need to plot the best fitting Gaussian distribution. On the X-axis we should have intervals in seconds while Y-axis should show probability density. How can I make Python know in which interval to count each measurement from the list when counting probability density? I already counted probability density for the borders of the intervals (using the formula for the normal distribution) and the maximum value of the density as my professor told me. We have also plotted histograms with measurements in seconds and in probability density (using plt.hist and setting density=True). But now we're stuck. Sorry, if it may sound simple, I've been learning Python for two months only. Here is my code (sorry for some lines in Russian):

import math
import matplotlib.pyplot as plt
import numpy as np
from scipy import stats

def trunc(n, decimals=0):
    """ Rounds your number n up to a given digit. Zero figures after dot on default"""
    multiplier = 10 ** decimals
    return int(n * multiplier) / multiplier


# We get N  which is the number of measurements.
N = int(input())

# We collect the data as input. N measurements (type=float) in total.
t = [float(input()) for i in range(N)]

# Now we want to find the minimum and maximum values, the average time and the sum of all measurements.
sum_t = sum(t)
avr_t = trunc(sum_t / N, 4)
t_max = max(t)
t_min = min(t)
print('Min time:', t_min, 'Max time: ', t_max)
print('Measurements sum:', sum_t)
print('Average time:', avr_t)

# Now we are calculating the deviation from the mean for all measurements and the variance.
# We are also calculating the sum of deviations to check the accuracy of the calculations we've done.
sum_dev_t = 0
sum_sqrD = 0
for i in range(N):
    dev_t = trunc(t[i] - avr_t, 4)
    sum_dev_t = trunc(sum_dev_t + dev_t, 3)
    sqrD = trunc(dev_t ** 2, 3)
    sum_sqrD = sum_sqrD + sqrD
    print('Measurement number:', i + 1, 'Measured value:', t[i], 't{i} - <t> :', trunc(dev_t, 3),
      '(t{i} - <t>)^2 :', sqrD)
print('Control sum:', sum_dev_t)

# We calculate the distance between the largest and the smallest measurement and divide it into intervals.
total_t = trunc(t_max - t_min, 2)
inter = trunc(total_t / math.floor(N ** 0.5), 6)

# We calculate how many measurements there are in each interval.
dN = [0 for i in range(N)]
ranges = []
probdens = []
in_min = t_min
for i in range(math.floor(N ** 0.5)):
    ranges.append(in_min)
    for j in range(N):
        if in_min <= t[j] < trunc(in_min + inter, 6):
            dN[i] = dN[i] + 1
    in_min = trunc(in_min + inter, 6)
    print('There are', dN[i], 'measurements in the range [', trunc(in_min - inter, 2), ';', trunc(in_min, 2), ').\
 Probability is ', dN[i]/N, 'Probability density is ', trunc(dN[i]/(N * inter), 2))
    probdens.append(trunc(dN[i]/(N * inter), 2))

# We calculate sigma and the maximum density probability (r_max) using Gaussian distribution formula.
sigma = trunc((sum_sqrD / N) ** 0.5, 3)
print('STD is', sigma)
r_max = trunc(0.4 / sigma, 3)
print('Max probability density: ', r_max)

# Now we are calculating the probability density for Gaussian distribution for the interval borders.
in_min = t_min
a = 1 / (sigma * (2 * math.pi) ** 0.5)
for i in range(1 + math.floor(N ** 0.5)):
    r = a * math.exp(-((in_min - avr_t) ** 2) / (2*sigma ** 2))
    print('Probability density value at t=', in_min, 'is', trunc(r, 3))
    in_min = trunc(in_min + inter, 3)
ranges.append(t_max)

# We are plotting the histogram (intervals in secs on X, number of measurements on Y)
plt.hist(t, bins=ranges)
plt.title('Histogram of measurement counts')
plt.xlabel('Measured time (seconds)')
plt.ylabel('Measurements number')
plt.show()


def gauss(x, a, avr_t, sigma):
    return a * math.exp(-(x - avr_t) ** 2 / (2 * sigma ** 2))


# We are plotting the probability density histogram (intervals in secs on X, density probability on Y)
# And trying to plot Gaussian distribution fit.
plt.hist(t, bins=ranges, density=True)
x = np.linspace(avr_t - 5*sigma, avr_t + 5*sigma, 1000)
plt.plot(x, stats.norm.pdf(x, avr_t, sigma))
plt.title('Probability density histogram')
plt.xlabel('Measured time, seconds')
plt.ylabel('Probability density')
plt.show()

# We want to see how many measurements there are 1, 2, 3 square deviations away from the mean.
in_inter = [0 for i in range(4)]
for i in range(1, 4):
    for j in range(N):
        if avr_t - i * sigma <= t[j] <= avr_t + i*sigma:
            in_inter[i - 1] = in_inter[i - 1] + 1
    print('There are', in_inter[i - 1], f'measurements in the range <t> +- {i}sigma:', (avr_t - i * sigma, avr_t + i * sigma),
      '. Relative frequency is ', in_inter[i - 1] / N)

# Let's calculate sigma_<t>:
sigma_avr_t = sigma / ((N - 1) ** 0.5)
print('Average STD is ', sigma_avr_t)
  • Can you add an example of input data and the desired output? – Marat Oct 28 '20 at 20:25
  • Also, it sounds very similar to `sns.distplot`: https://seaborn.pydata.org/generated/seaborn.distplot.html – Marat Oct 28 '20 at 20:27
  • Does this answer your question? [Gaussian fit for Python](https://stackoverflow.com/questions/19206332/gaussian-fit-for-python) – mkrieger1 Oct 28 '20 at 20:55
  • Hi, it's not clear what you mean by "I already counted probability density for the borders of the intervals ... the maximum value of the density". What is the physical quantity that is being measured? Thanks for any additional information. – Robert Dodier Oct 28 '20 at 22:19
  • I still can't plot the Gaussian distribution. But I added my code. – E. Timo Petrenko Nov 01 '20 at 19:33
  • @Marat , added some code. – E. Timo Petrenko Nov 01 '20 at 20:39
  • @mkrieger , added some code – E. Timo Petrenko Nov 01 '20 at 20:40
  • @Robert , added some code – E. Timo Petrenko Nov 01 '20 at 20:40
  • why don't you use `numpy`, `scipy` and `pandas`? This whole code would boil down to maybe ~10 lines – Marat Nov 01 '20 at 21:26
  • Thanks for the update. Looks like you already have a lot done there. I feel like there is maybe a misunderstanding that is keeping you from finishing the job. Do you know that the best-fitting Gaussian distribution is specified by just the mean and standard deviation of the data? Given that, to plot the best-fitting Gaussian distribution, all you have to do is plot your function `gauss` over some range (e.g. mean value plus or minus 3 standard deviations). You can evaluate `gauss` at each datum and plot that with a circle or dot, let's say, to distinguish them. Good luck, I think you're close. – Robert Dodier Nov 02 '20 at 03:00
  • @RobertDodier , thank you! I fixed it! – E. Timo Petrenko Nov 03 '20 at 20:43
  • Corrected the code. – E. Timo Petrenko Nov 03 '20 at 20:44

0 Answers0