1

I have two arrays with x- and y- data.

This data shows lognormal behavior. I need a graph of the fit, as well as the mu and the sigma to do some statistics.

I did a fit, in order to calculate the mu, the sigma, and further on some statistical values of it. (See code below)

I obtain the scaling factor, with which I have to multiply the distribution with an integral over the datapoints.

The code below, does work. My question now is, if (I am sure) there is a better way to do this? It feels like a workaround, that will work sometimes. I want a better way to do this, because I have to plot hundreds of these.

My code (sorry, that it is this long, wanted to include everything except import of crude data):

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

# produce plot True/False
ploton = True

x0=np.array([3.58381e+01, 3.27125e+01, 2.98680e+01, 2.72888e+01, 2.49364e+01,
   2.27933e+01, 2.08366e+01, 1.90563e+01, 1.74380e+01, 1.59550e+01,
   1.45904e+01, 1.33460e+01, 1.22096e+01, 1.11733e+01, 1.02262e+01,
   9.35893e+00, 8.56556e+00, 7.86688e+00, 7.20265e+00, 6.59782e+00,
   6.01571e+00, 5.53207e+00, 5.03979e+00, 4.64415e+00, 4.19920e+00,
   3.83595e+00, 3.50393e+00, 3.28070e+00, 3.00930e+00, 2.75634e+00,
   2.52050e+00, 2.31349e+00, 2.12280e+00, 1.92642e+00, 1.77820e+00,
   1.61692e+00, 1.49094e+00, 1.36233e+00, 1.22935e+00, 1.14177e+00,
   1.03078e+00, 9.39603e-01, 8.78425e-01, 1.01490e+00, 1.07461e-01,
   4.81523e-02, 4.81523e-02, 1.00000e-02, 1.00000e-02])

y0=np.array([3.94604811e+04, 2.78223936e+04, 1.95979179e+04, 2.14447807e+04,
   1.68677487e+04, 1.79429516e+04, 1.73589776e+04, 2.16101026e+04,
   3.79705638e+04, 6.83622301e+04, 1.73687772e+05, 5.74854475e+05,
   1.69497465e+06, 3.79135941e+06, 7.76757753e+06, 1.33429094e+07,
   1.96096415e+07, 2.50403065e+07, 2.72818618e+07, 2.53120387e+07,
   1.93102362e+07, 1.22219224e+07, 4.96725699e+06, 1.61174658e+06,
   3.19352386e+05, 1.80305856e+05, 1.41728002e+05, 1.66191809e+05,
   1.33223816e+05, 1.31384905e+05, 2.49100945e+05, 2.28300583e+05,
   3.01063903e+05, 1.84271914e+05, 1.26412781e+05, 8.57488083e+04,
   1.35536571e+05, 4.50076293e+04, 1.98080100e+05, 2.27630303e+05,
   1.89484527e+05, 0.00000000e+00, 1.36543525e+05, 2.20677520e+05,
   3.60100586e+05, 1.62676486e+05, 1.90105093e+04, 9.27461467e+05,
   1.58373542e+05])


Dnm = x0
dndlndp  = y0



#lognormal PDF:
def f(x, mu, sigma) :
    return 1/(np.sqrt(2*np.pi)*sigma*x)*np.exp(-((np.log(x)-mu)**2)/(2*sigma**2))

#normalizing y-values to obtain lognormal distributed data:
y0_normalized = y0/np.trapz(x0.ravel(), y0.ravel())

#calculating mu/sigma of this distribution:
params, extras = curve_fit(f, x0.ravel(), y0_normalized.ravel())

median = np.exp(params[0])
mu = params[0]
sigma = params[1]

#output of mu / sigma / calculated median:
print "mu=%g, sigma=%g" % (params[0], params[1])
print "median=%g" % median

#new variable z for smooth fit-curve:
z = np.linspace(0.1, 100, 10000)
#######################

Dnm = np.ravel(Dnm)
dndlndp = np.ravel(dndlndp)

Dnm_rev = list(reversed(Dnm))
dndlndp_rev = list(reversed(dndlndp))

scalingfactor = np.trapz(dndlndp_rev, Dnm_rev, dx = np.log(Dnm_rev))

#####################

#plotting
if ploton:
    plt.plot(z, f(z, mu, sigma)*scalingfactor, label="fit", color = "red")
    plt.scatter(x0, y0, label="data")
    plt.xlim(3,20)
    plt.xscale("log")
    plt.legend()

EDIT1: Maybe I should add that I have no idea, why the scaling factor calculated with

scalingfactor = np.trapz(dndlndp_rev, Dnm_rev, dx = np.log(Dnm_rev))

is right. It was simply try and error. I really want to know, why this does the trick, since the "area" of all bins combined is:

N = np.trapz(dndlndp_rev, np.log(Dnm_rev), dx = np.log(Dnm_rev))

because the width of the bins is log(Dnm).

EDIT2: Thank you for all answers. I copied the arrays into the code, which is now runable. I want to simplify the question, since i think, due to my poor english, i was not able to say what i really want:

I have lognormal set of data. The code above allows me to calculate the mu and the sigma. To do so, i need to normalize the data, and the area under the function is from now on = 1.

In order to plot a lognormal function with the calculated mu and sigma, i need to multiply the function with an (unknown) factor, because the area under the real function is something like 1e8, but sure not one. I did a workaround by calculating this "scalingfactor" via the trapz integral of the diskrete crude data.

There has to be a better way to plot the fitted function, when mu and sigma are already known.

plot of lognormal fit / scaling

Carolyn
  • 65
  • 2
  • 6
  • 1
    People can't run your code right now because you don't provide them with `dndlndp` and `Dnm` – Sheldore Jan 08 '19 at 22:03
  • Possible duplicate of [Fitting empirical distribution to theoretical ones with Scipy (Python)?](https://stackoverflow.com/questions/6620471/fitting-empirical-distribution-to-theoretical-ones-with-scipy-python) – Tarifazo Jan 09 '19 at 15:19
  • Thank you both. I now changed the code, it is runable now. I also added "EDIT2", which should clarify my question. – Carolyn Jan 10 '19 at 18:31

0 Answers0