5

I am trying to draw the lognormal distribution for my data. using the following code:

mu, sigma = 136519., 50405. # mean and standard deviation
hs = np.random.lognormal(mu, sigma, 1000) #mean, s dev , Size
count, bins, ignored = plt.hist(hs, 100, normed=True)     
x = np.linspace(min(bins), max(bins), 10000)
pdf = (math.exp(-(np.log(x) - mu)**2 / (2 * sigma**2)))
#plt.axis('tight')
plt.plot(x, pdf, linewidth=2, color='r')

As you can see, my mean and sigma are big values, it creates the problem that hs goes to infinity that gives an error. While if I put something like mu =3 and sigma =1, it works, any suggestions for big numbers?

Update 1 :

I corrected my code with the first answer, but now I only get a straight line :

 mu, sigma = 136519 , 50405 # mean and standard deviation

    normal_std = np.sqrt(np.log(1 + (sigma/mu)**2))
    normal_mean = np.log(mu) - normal_std**2 / 2
    hs = np.random.lognormal(normal_mean, normal_std, 1000)
    print(hs.max())    # some finite number
    print(hs.mean())   # about 136519
    print(hs.std())    # about 50405

#    hs = np.random.lognormal(mu, sigma, 1000) #mean, s dev , Size
#    
    count, bins, ignored = plt.hist(hs, 100, normed=True) 

    x = np.linspace(min(bins), max(bins), 10000)
    pdfT = [];
    for el in range (len(x)):
        pdfTmp = (math.exp(-(np.log(x[el]) - mu)**2 / (2 * sigma**2)))
        pdfT += [pdfTmp]


    #plt.axis('tight')
    pdf = np.asarray(pdfT)
    plt.plot(x, pdf, linewidth=2, color='r')

image pdf

FabioSpaghetti
  • 790
  • 1
  • 9
  • 35

1 Answers1

7

The parameters mu and sigma in np.random.lognormal are not the mean and STD of the lognormal distribution. They are the mean and STD of the underlying normal distribution, that is of log(X). This means that by passing 136519 for the mean you ask NumPy to generate numbers of size exp(136519) which is about 10**60000, far beyond the double precision limits.

With a bit of algebra you can get the correct parameters for np.random.lognormal from the ones you have.

mu, sigma = 136519., 50405.
normal_std = np.sqrt(np.log(1 + (sigma/mu)**2))
normal_mean = np.log(mu) - normal_std**2 / 2
hs = np.random.lognormal(normal_mean, normal_std, 1000)
print(hs.max())    # some finite number
print(hs.mean())   # about 136519
print(hs.std())    # about 50405
  • Thank you so much, It worked ,but I corrected my code, still : I get a straight line , shall you take a look at the new one ? – FabioSpaghetti Jul 31 '18 at 16:38
  • 1
    Because you are again using wrong values of mu and sigma there. Just one line is needed to compute `pdf`: `pdf = np.exp(-(np.log(x) - normal_mean)**2 / (2 * normal_std**2))` –  Jul 31 '18 at 17:21
  • I thank you again, I tried the code on my numbers, yet I am getting results that do not convince me, since now the question is different, I asked it in another question, can I link it here ? – FabioSpaghetti Jul 31 '18 at 21:57
  • Just one thing came to my mind : the 136519 I gave you as the mean, is the actual number of my data, and it is not converted to lognormal. I guess something is not clear for me, since in my code, the lognormal curve did not perfectly fit the bins – FabioSpaghetti Aug 02 '18 at 09:06