3

I was trying to get the mean and standard deviation for log-normal distribution, where mu=0.4104857306 and sigma=3.4070874277012617, and I am expecting mean=500 and std=600. I am unsure what I have done wrong. Here are the code:

import scipy.stats as stats
import numpy as np
a = 3.4070874277012617
b = 0.4104857306
c = stats.lognorm.mean(a,b)
d = stats.lognorm.var(a,b)
e = np.sqrt(d)
print("Mean =",c)
print("std =",e)

And the outputs are here:

Mean = 332.07447304207903
sd = 110000.50047821256

Thank you in advance.

Edit:

Thank you for your help. I have checked and found out there were some calculation mistake. I can now get the mean=500 but still cannot get std=600. Here is the code that I have used:

import numpy as np
import math
from scipy import exp
from scipy.optimize import fsolve

def f(z):
    mean = 500
    std = 600
    sigma = z[0]
    mu = z[1]
    f = np.zeros(2)
    f[0] = exp(mu + (sigma**2) / 2) - mean
    f[1] = exp(2*mu + sigma**2) * exp(sigma**2 - 1) - std**2
    return f
z = fsolve (f,[1.1681794012855686,5.5322865416282365])
print("sigma =",z[0])
print("mu =",z[1])
print(f(z))

sigma = 1.1681794012855686
mu = 5.5322865416282365

I have tried to check the result with my calculator, and I can get std=600 as required, I still get 853.5698320847896 with lognorm.std(sigma, scale=np.exp(mu)).

VincentN
  • 111
  • 1
  • 9
  • 1
    Check your calculation for standard deviation. I am getting 500 and something else (165831.240) when I fix your code. – cs95 Aug 18 '18 at 07:08

1 Answers1

3

The scipy.stats.lognorm lognormal distribution is parameterised in a slightly unusual way, in order to be consistent with the other continuous distributions. The first argument is the shape parameter, which is your sigma. That's followed by the loc and scale arguments, which allow shifting and scaling of the distribution. Here you want loc=0.0 and scale=exp(mu). So to compute the mean, you want to do something like:

>>> import numpy as np
>>> from scipy.stats import lognorm
>>> mu = 0.4104857306
>>> sigma = 3.4070874277012617
>>> lognorm.mean(sigma, 0.0, np.exp(mu))
500.0000010889041

Or more clearly: pass the scale parameter by name, and leave the loc parameter at its default of 0.0:

>>> lognorm.mean(sigma, scale=np.exp(mu))
500.0000010889041

As @coldspeed says in his comment, your expected value for the standard deviation doesn't look right. I get:

>>> lognorm.std(sigma, scale=np.exp(mu))
165831.2402402415

and I get the same value calculating by hand.

To double check that these parameter choices are indeed giving the expected lognormal, I created a sample of a million deviates and looked at the mean and standard deviation of the log of that sample. As expected, those give me back values that look roughly like your original mu and sigma:

>>> samples = lognorm.rvs(sigma, scale=np.exp(mu), size=10**6)
>>> np.log(samples).mean()  # should be close to mu
0.4134644116056518
>>> np.log(samples).std(ddof=1)  # should be close to sigma
3.4050012251732285

In response to the edit: you've got the formula for the variance of a lognormal slightly wrong - you need to replace the exp(sigma**2 - 1) term with (exp(sigma**2) - 1). If you do that, and rerun the fsolve computation, you get:

sigma = 0.9444564779275075
mu = 5.768609079062494

And with those values, you should get the expected mean and standard deviation:

>>> from scipy.stats import lognorm
>>> import numpy as np
>>> sigma = 0.9444564779275075
>>> mu = 5.768609079062494
>>> lognorm.mean(sigma, scale=np.exp(mu))
499.9999999949592
>>> lognorm.std(sigma, scale=np.exp(mu))
599.9999996859631

Rather than using fsolve, you could also solve analytically for sigma and mu, given the desired mean and standard deviation. This gives you more accurate results, more quickly:

>>> mean = 500.0
>>> var = 600.0**2
>>> sigma = np.sqrt(np.log1p(var/mean**2))
>>> mu = np.log(mean) - 0.5*sigma*sigma
>>> mu, sigma
(5.768609078769636, 0.9444564782482624)
>>> lognorm.mean(sigma, scale=np.exp(mu))
499.99999999999966
>>> lognorm.std(sigma, scale=np.exp(mu))
599.9999999999995
Mark Dickinson
  • 29,088
  • 9
  • 83
  • 120
  • Thank you for your help. I have checked and found out there were some calculation mistake. I can now get the mean=500 but still cannot get std=600. And I have included the code in the edited post. – VincentN Aug 18 '18 at 09:06
  • You have the wrong formula for the variance. Use `(exp(sigma**2) - 1)` in place of `exp(sigma**2 - 1)`. – Mark Dickinson Aug 18 '18 at 09:37
  • What's the difference between them? coz it still gives the same result – VincentN Aug 18 '18 at 09:49
  • @VincentN: For me, it gives the correct result with `exp(sigma**2) - 1`, and the wrong result with your original `exp(sigma**2 - 1)`. As to "what's the difference": `exp(x - 1)` and `exp(x) - 1` are different functions, in the same way that `(x - 1)**2` and `x**2 - 1` are different functions. Why would you expect them to be the same? – Mark Dickinson Aug 18 '18 at 09:55
  • Oh, I don't know why I somehow overlook about the bracket last night. It is now giving the correct answer. Thank you for your help. – VincentN Aug 18 '18 at 15:45