0

From https://stackoverflow.com/a/30460089/2202107, we can generate CDF of a normal distribution:

import numpy as np
import matplotlib.pyplot as plt

N = 100
Z = np.random.normal(size = N)
# method 1
H,X1 = np.histogram( Z, bins = 10, normed = True )
dx = X1[1] - X1[0]
F1 = np.cumsum(H)*dx
#method 2
X2 = np.sort(Z)
F2 = np.array(range(N))/float(N)

# plt.plot(X1[1:], F1)
plt.plot(X2, F2)
plt.show()

Question: How do we generate the "original" normal distribution, given only x (eg X2) and y (eg F2) coordinates?

Sida Zhou
  • 3,529
  • 2
  • 33
  • 48

1 Answers1

0

My first thought was plt.plot(x,np.gradient(y)), but gradient of y was all zero (data points are evenly spaced in y, but not in x) These kind of data is often met in percentile calculations. The key is to get the data evenly space in x and not in y, using interpolation:

x=X2
y=F2
num_points=10

xinterp = np.linspace(-2,2,num_points)
yinterp = np.interp(xinterp, x, y)

# for normalizing that sum of all bars equals to 1.0
tot_val=1.0
normalization_factor = tot_val/np.trapz(np.ones(len(xinterp)),yinterp)

plt.bar(xinterp, normalization_factor * np.gradient(yinterp), width=0.2)
plt.show()

output looks good to me:

enter image description here

I put my approach here for examination. Let me know if my logic is flawed.

One issue is: when num_points is large, the plot looks bad, but it's a issue in discretization, not sure how to avoid it.

Related posts:

Sida Zhou
  • 3,529
  • 2
  • 33
  • 48