Plotting histogram with theoretical curve: Random realization

Question

I need to write a program that generates random realizations of the Cauchy distribution

$f(x)=\frac{1}{\pi{(1+x^{2})}}$

with null location and unit scale.

Also I need to make a histogram between -5 and 5 bins, for a random realization of 1,000 points, together with the theoretical curve making sure they have the same units.

I calculated the cumulative distribution function F(x) for the Cauchy distribution:

$F(x)=tan[{\pi{(rand()-{\frac{1}{2})}}}]$

And I wrote the following python code:

from __future__ import division
import scipy
import random
import matplotlib.pyplot as plt
import numpy as np
import math as m


valuesX = []
for q in range(1000):
    R = random.random()
    x = m.tan(m.pi*(R-0.5)) #Cumulative Function
    valuesX.append(x)

z = np.linspace(-10,10,1000)
y = 1/(m.pi*(1+z**2)) #Theoretical Cauchy

plt.plot(y,z)
plt.hist(valuesX, bins = 50, range = [-5,5], normed=True)

My results were:

I don't know if this is acceptable since I'm plotting discrete values (random realization) against a probability density function. How could I compare the two of them? since I need to find the fractional difference for the plot above and determine the global rms deviations between both curves as a function of the size of the random realization.

If you want to compare the probability density function with the random realization, i suggest to calculate the expected realization for given n and compared it with the random realization (n=1000). Then would have a comparison of two discrete realizations, whereas one resembles a real (=random) and the other a theoretical (=expected ) realization. — Nikolas Rieble, Jun 10 '16 at 08:26

score 0 · Answer 1 · answered Mar 11 '18 at 08:11

If you have a large number of sample points, and your histogram bins are sufficiently narrow, then what you've already done should be fine. In that case, the "normed=True" setting in plt.hist() will give you a good approximation to the probability density at the centre of each bin. The sampling error will then be given by the square-root of the number of samples in each bin.

On the other hand, if the number of samples is smaller, so that each histogram bin needs to be wider in order to contain more than a few samples, then the comparison between the histogram and the theoretical probability density function becomes more subtle. In that case, one needs to allow for each bin representing the integral of the probability density over the extent of each bin.

So, for a bin that covers the range [a,b], one needs to evaluate the following integral:

When multiplied by the total number of samples, this gives the number of samples that you'd expect, on average, within a given bin.

For convenience, one can also rearrange this formula to have the following form:

which should have better numerical accuracy when the bins are thin (i.e. b is not much greater than a). This formula also makes it clearer how, for very thin bins, one recovers the expression for the probability density itself.

Plotting histogram with theoretical curve: Random realization

1 Answers1