maximum likelihood estimation for a user defined probabilty density function (pdf) in python

Question

I am working on MLE and I want to optimize my loglikelihood function. I am using the code:

I have a very specific doubt:

--> I have yObs and yPred but I am confused how should I include yObs and yPred in my likelihood function as done here:

logLik = -np.sum( stats.norm.logpdf(yObs, loc=yPred, scale=sd) )

My likelihood function only has x as sample space and two unknown parameters:

They have used a function called stats.norm.logpdf but I am not using normal distribution.

Thanks in advance.

Regards

If you know your ditribution function you just need to substitute it to the gaussian distribution in the example. SUppose you have a distribution function `f(x, a, b)` where `a` and `b` are 2 parameters (don't know how to make greek letters...). The log likelihood of your sample is computed as `logLik(a,b) = - np.sum(np.log(f(yObs, a, b)))` or `-np.log(f(yObs, a,b).prod())`. Now you "just" need to minimize this function with respect to `a` and `b`. To do this in python you can probably use lambda function, i.e. `loglik(a,b)= lambda a,b: - np.sum(np.log(f(yObs, a, b)))` — gionni, Jun 01 '17 at 09:33
@gionni, thanks for your reply but I am able to get you properly. How do I subsitute my distribution function to the gaussian distribution and what is the role of Gaussian distribution in my distribution function? I would really be thankfu if you can clear my confusion. — Manish Sharma, Jun 01 '17 at 09:37
If I understand correctly your data do not come from a gaussian distribution, but rather from a different distribution with PDF `f(x, a, b)`. As stated above you just need to plug your distribution function in the log likelihood formula, instead of using the gaussian distribution. What distribution are you using? — gionni, Jun 01 '17 at 09:43
I am using k pdf. it is different from gaussian. I have the true value of my pdf corresponding to each sample space (yObs,x). How do I generate the log likelihood function? In the link I have shared, they have used gaussian distribution. I have written my k-pdf in python and has to calculate the log likelihood of it. — Manish Sharma, Jun 01 '17 at 09:48
OK, so I guess you wrote a function for the pdf. Just substitue such function in the code above and you have your loglikelihood, which will depend on the 2 parameters you are using. If you maximize that function with respect to the parameters you have (python have some already implemented optimization libraries), you can find the parameters that best fit your data for your given distribution — gionni, Jun 01 '17 at 09:55
thanks for your answer. I have explored the optmization libraries. Do I have to change only the **`yPred`**? I have also developed my loglikelihood function on my notebook which has to be minimized. `loglik` in the code is for the observed values which are normally distributed. I am still not getting where do I have to subsitute my function. I have to fit my data to k-pdf. — Manish Sharma, Jun 01 '17 at 10:06

score 1 · Accepted Answer · answered Jun 01 '17 at 10:33

Expanding on the comments.

You have the K pdf K(x,mu, nu).

I guess you have a sample of observation yObs which I'll assume is an array and another array yPred (note that the example you take this from uses a simple linear regression to obtain yPred and is actually trying to find the regression parameters, rather than the distribution ones, although the answer overall looks weird).

If you are just trying to find the parameters that best fit your sample, then yPred is useless and you can find your likelihood (as a function of the 2 parameters) as:

logLik =  lambda mu, nu: - np.sum(np.log(K(yObs, mu, nu)))

and then minimize over mu, nu.

If you want to use code like that found in the post you reference you need to change the function like this:

def regressLL(params):

    b0 = params[0]
    b1 = params[1]
    nu = params[2]


    yPred = b0 + b1*x


    logLik = -np.log( np.prod(K(yObs, mu=yPred, nu=nu)))

    return(logLik)

Remember that in the second case your fuction K must be able to take an array for mu. I wouldn't suggest the second approach since it uses a different mean for each observation in the sample and in general I don't understand what it is trying to accomplish (looks like it is trying to predict the mean from the observations in some messy way), but it might be a valid approach which I have never seen.

this seems to be very helpful. I have a question, what is the reason to replace **`x`** with **`yObs`** in the **`logLik`** equation? I have **`yObs`** for each **`x`** ans I have to fit k-pdf to this data. — Manish Sharma, Jun 01 '17 at 11:17
So you have a multivariate problem with n variables (x is in R^n) and N observations for each? — gionni, Jun 01 '17 at 14:56
I have data and I have to fit k-pdf and the pdf has two unknown parameters. — Manish Sharma, Jun 02 '17 at 07:52
yes I have implemented it but I have a question. How should i select my sample size **`YObs`** because I did random sampling and everytime I change the sample size, my estimated parameters changes and if I take the whole population then my estimated parameters are wrong. — Manish Sharma, Jun 12 '17 at 08:30

maximum likelihood estimation for a user defined probabilty density function (pdf) in python

1 Answers1