Maximum Likelihood Estimate pseudocode

Question

I need to code a Maximum Likelihood Estimator to estimate the mean and variance of some toy data. I have a vector with 100 samples, created with numpy.random.randn(100). The data should have zero mean and unit variance Gaussian distribution.

I checked Wikipedia and some extra sources, but I am a little bit confused since I don't have a statistics background.

Is there any pseudo code for a maximum likelihood estimator? I get the intuition of MLE but I cannot figure out where to start coding.

Wiki says taking argmax of log-likelihood. What I understand is: I need to calculate log-likelihood by using different parameters and then I'll take the parameters which gave the maximum probability. What I don't get is: where will I find the parameters in the first place? If I randomly try different mean & variance to get a high probability, when should I stop trying?

sorry for the confusion, the data is a vector with 100 samples. — , Oct 10 '11 at 20:21

Nate · Answer 1 · 2013-08-21T03:32:55.030

I just came across this, and I know its old, but I'm hoping that someone else benefits from this. Although the previous comments gave pretty good descriptions of what ML optimization is, no one gave pseudo-code to implement it. Python has a minimizer in Scipy that will do this. Here's pseudo code for a linear regression.

# import the packages
import numpy as np
from scipy.optimize import minimize
import scipy.stats as stats
import time

# Set up your x values
x = np.linspace(0, 100, num=100)

# Set up your observed y values with a known slope (2.4), intercept (5), and sd (4)
yObs = 5 + 2.4*x + np.random.normal(0, 4, 100)

# Define the likelihood function where params is a list of initial parameter estimates
def regressLL(params):
    # Resave the initial parameter guesses
    b0 = params[0]
    b1 = params[1]
    sd = params[2]

    # Calculate the predicted values from the initial parameter guesses
    yPred = b0 + b1*x

    # Calculate the negative log-likelihood as the negative sum of the log of a normal
    # PDF where the observed values are normally distributed around the mean (yPred)
    # with a standard deviation of sd
    logLik = -np.sum( stats.norm.logpdf(yObs, loc=yPred, scale=sd) )

    # Tell the function to return the NLL (this is what will be minimized)
    return(logLik)

# Make a list of initial parameter guesses (b0, b1, sd)    
initParams = [1, 1, 1]

# Run the minimizer
results = minimize(regressLL, initParams, method='nelder-mead')

# Print the results. They should be really close to your actual values
print results.x

This works great for me. Granted, this is just the basics. It doesn't profile or give CIs on the parameter estimates, but its a start. You can also use ML techniques to find estimates for, say, ODEs and other models, as I describe here.

I know this question was old, hopefully you've figured it out since then, but hopefully someone else will benefit.

isn't the slope and y-int `(1.0101010101010102, 1.0)` respectively? — O.rka, Sep 30 '15 at 20:04

score 20 · Accepted Answer · edited Nov 08 '18 at 08:56

If you do maximum likelihood calculations, the first step you need to take is the following: Assume a distribution that depends on some parameters. Since you generate your data (you even know your parameters), you "tell" your program to assume Gaussian distribution. However, you don't tell your program your parameters (0 and 1), but you leave them unknown a priori and compute them afterwards.

Now, you have your sample vector (let's call it x, its elements are x[0] to x[100]) and you have to process it. To do so, you have to compute the following (f denotes the probability density function of the Gaussian distribution):

f(x[0]) * ... * f(x[100])

As you can see in my given link, f employs two parameters (the greek letters µ and σ). You now have to calculate the values for µ and σ in a way such that f(x[0]) * ... * f(x[100]) takes the maximum possible value.

When you've done that, µ is your maximum likelihood value for the mean, and σ is the maximum likelihood value for standard deviation.

Note that I don't explicitly tell you how to compute the values for µ and σ, since this is a quite mathematical procedure I don't have at hand (and probably I would not understand it); I just tell you the technique to get the values, which can be applied to any other distributions as well.

Since you want to maximize the original term, you can "simply" maximize the logarithm of the original term - this saves you from dealing with all these products, and transforms the original term into a sum with some summands.

If you really want to calculate it, you can do some simplifications that lead to the following term (hope I didn't mess up anything):

Now, you have to find values for µ and σ such that the above beast is maximal. Doing that is a very nontrivial task called nonlinear optimization.

One simplification you could try is the following: Fix one parameter and try to calculate the other. This saves you from dealing with two variables at the same time.

Thanks for your answer. What I understand is: if I keep one parameter fixed and calculate the other and vice versa, I'll actually do Expectation Maximization algorithm, right? http://en.wikipedia.org/wiki/Expectation%E2%80%93maximization_algorithm — , Oct 10 '11 at 20:50
I think this could be the case (but I'm in no way sure about this). I think starting with the mean (average value) as starting point for µ (fixing µ to the mean) and then maximizing σ could be a good start... — phimuemue, Oct 10 '11 at 20:58
@Kyle: maybe http://en.wikipedia.org/wiki/Maximum_likelihood#Continuous_distribution.2C_continuous_parameter_space is of interest for you... — phimuemue, Oct 10 '11 at 21:01
@Kyle FYI The MLEs for the Gaussian are both obtainable analytically. They are the sample mean and sample variance, although the latter is biased slightly for small sample sizes, so one often divides by n-1 rather than n. More generally, you'll want to learn Newton's Method and maybe EM (Expectation-Maximization). — joran, Oct 10 '11 at 21:25
@joran So actually, for Gaussian distribution, if I take sample mean and sample variance, I'll obtain the MLE of the dataset. Actually, I need to apply a biased and unbiased MLE estimation for the dataset. So in this case, do you know if just taking sample mean and sample variance works? — , Oct 10 '11 at 22:00
Made math code in latex more readable using latex. Code if anyone need it / for changes: n\cdot ln(\frac{1}{\sigma\sqrt{2\pi} } ) - 0.5 \sum^{100}_{i=0}{\frac{(x[i]-\mu)^2}{2\sigma}} Use this to convert: https://codecogs.com/latex/eqneditor.php — ntg, Nov 08 '18 at 08:58

score 4 · Answer 3 · answered Oct 11 '11 at 11:26

You need a numerical optimisation procedure. Not sure if anything is implemented in Python, but if it is then it'll be in numpy or scipy and friends.

Look for things like 'the Nelder-Mead algorithm', or 'BFGS'. If all else fails, use Rpy and call the R function 'optim()'.

These functions work by searching the function space and trying to work out where the maximum is. Imagine trying to find the top of a hill in fog. You might just try always heading up the steepest way. Or you could send some friends off with radios and GPS units and do a bit of surveying. Either method could lead you to a false summit, so you often need to do this a few times, starting from different points. Otherwise you may think the south summit is the highest when there's a massive north summit overshadowing it.

If the probability density function has closed-form solution, then you don't have to use numeric optimization. For example, parameters of multi-variate gaussion can be evaluated by derivative w.r.t. mu and sigma and equate it to 0. Optimum parameters corresponds to mu and sigma of the data. — yasin.yazici, Oct 28 '17 at 09:57

score 1 · Answer 4 · answered Jul 17 '13 at 17:32

As joran said, the maximum likelihood estimates for the normal distribution can be calculated analytically. The answers are found by finding the partial derivatives of the log-likelihood function with respect to the parameters, setting each to zero, and then solving both equations simultaneously.

In the case of the normal distribution you would derive the log-likelihood with respect to the mean (mu) and then deriving with respect to the variance (sigma^2) to get two equations both equal to zero. After solving the equations for mu and sigma^2, you'll get the sample mean and sample variance as your answers.

See the wikipedia page for more details.

Maximum Likelihood Estimate pseudocode

4 Answers4

Linked