2

I am trying to minimize a function that outputs chi-square via scipy and find the mu,sigma,normc that provide the best fit for a Gaussian overlay.

from math import exp
from math import pi
from scipy.integrate import quad
from scipy.optimize import minimize
from scipy.stats import chisquare
import numpy as np

# guess intitial values for minimized chi-square
mu, sigma = np.mean(mydata), np.std(mydata) # mydata is my data points
normc = 1/(sigma * (2*pi)**(1/2)) 

gauss = lambda x: normc * exp( (-1) * (x - mu)**2 / ( 2 * (sigma **2) ) ) # Gaussian Distribution

# assume I have pre-defined bin-boundaries as a list called binbound

def expvalperbin(binbound,mu,sigma,normc):
    # calculates expectation value per bin
    ans = []
    for index in range(len(binbound)):
        if index != len(binbound)-1:
            ans.append( quad( gauss, binbound[index], binbound[index+1])[0] )
    return ans

expvalguess = expvalperbin(binbound,mu,sig,normc)
obsval = countperbin(binbound,mydata)
arglist = [mu,sig,norm]

def chisquareopt(obslist,explist):
    return chisquare(obslist,explist)[0]

chisquareguess = chisquareopt((obsval,expvalguess), expvalguess, args=arglist)

result = minimize( chisquareopt(obsval,expvalguess), chisquareguess   )
print(result)

Running this code provides me with this error:

TypeError: chisquareopt() got an unexpected keyword argument 'args'

I have a few questions:

1) How can I write a function to allow arguments to be passed through to my function chisquareopt?

2) How can I tell if scipy will optimize parameters [mu, sigma, normc] that give the minimum chi-square? How could I find these parameters from the optimization?

3) It is difficult to know if I'm making progress here or not. Am I on the right track?

EDIT: If it is relevant, I have a function that inputs [mu, sigma, normc] and outputs a list of sublists, each sublist containing a possible combination of [mu, sigma, normc] (where the outer list covers all possible combinations of parameters within specified ranges).

2 Answers2

3

Typically these scipy functions pass the args tuple of values to your code unchanged. I should double check the code, but with

minimize(myfunc, x0, args=(y,z))

def myfunc(x, y, z): 
   <do something>

minimize takes the current value of the variable x (scalar or array, depending on what x0 looks like), and the args parameter, and constructs

args = tuple(x) + args
myfunc(*args)

In other words, it joins the args tuple with the iteration variable and passes it to your function. Thus any intermediate function definition need to work with that pattern.

To illustrate, define a function that takes a generic args tuple.

In [665]: from scipy.optimize import minimize
In [666]: def myfunc(*args):
     ...:     print(args)
     ...:     return np.abs(args[0])**2
     ...: 
In [667]: myfunc(1,2,3)
(1, 2, 3)
Out[667]: 1
In [668]: myfunc(2,2,3)
(2, 2, 3)
Out[668]: 4
In [669]: minimize(myfunc, 10, args=(2,3))
(array([ 10.]), 2, 3)
(array([ 10.00000001]), 2, 3)
(array([ 10.]), 2, 3)
(array([ 8.99]), 2, 3)
....
(array([-0.00000003]), 2, 3)
Out[669]: 
      fun: 1.7161984122524196e-15
 hess_inv: array([[ 0.50000001]])
      jac: array([-0.00000007])
  message: 'Optimization terminated successfully.'
     nfev: 15
      nit: 4
     njev: 5
   status: 0
  success: True
        x: array([-0.00000004])

(deleted discussion on the confusion regarding which parameters are being minimized. See other answer or my edit history)

hpaulj
  • 221,503
  • 14
  • 230
  • 353
  • So I should create tuple(mu,sigma,normc)? Or should I create a tuple of all possible combinations of (mu,sigma,normc)? –  Feb 27 '17 at 17:50
  • I've added a simple example. I'll have to think more about the combinations question. – hpaulj Feb 27 '17 at 17:57
  • I don't understand the function of `mu,sigma,normc`. Are they parameters that control the minimization over some other variable (the `binbound`), or are they variables that you want to optimize (choose the best combination)? – hpaulj Feb 27 '17 at 19:52
  • I would like to choose the best combination of mu,sigma,normc that produce the minimum chi-square. Basically, I calculate the mean and stdev of my data to estimate a guess value for each parameter. These parameters are then thrown into the function expvalperbin, which calculates the expectation values. The expectation value per bin is equivalent to the area under the curve per bin; here, per bin defines the bounds of integration and the expectation value per bin is appended into a list. There is a separate list of equal length, each element of which is the number of occurrences of ... –  Feb 27 '17 at 19:57
  • ... my data points per bin. With a list of expectation guess values that correspond to unchanging observed multiplicities per bin, I can now calculate a guess for my chi-square. I'd like to minimize chi-square by varying [mu,sigma,normc] about their values, which is difficult as they affect the distribution function that is integrated over binbounds for expectation values (hence the structure of my functions). My histogram bins are defined as the list of binborders, for which I want to overlay the best possible fit via chi-square. –  Feb 27 '17 at 20:03
3

I've simplified your problem somewhat to give you an idea on your question 2).

First, I've hard-coded your histogram obslist and the number of data points N as global variables (that simplifies the function signatures a little). Second I've hard-coded the bin boundaries in expvalperbin, assuming 9 bins with fixed width 5 and the first bin starts at 30 (so the histogram ranges from 30 to 75).

Third, I'm using optimize.fmin (Nelder-Mead) instead of optimize.minimize. The reason for using fmin instead of minimize is that the passing of additional parameters via args=(x,y) doesn't seem to work in the sense that the additional parameters are kept at the fixed values from the very first invocation. That's not what you want: you want to optimize over mu and sigma simultaneously.

Given these simplifications we have the following (surely very unpythonic) script:

from math import exp
from math import pi
from scipy.integrate import quad
from scipy.optimize import fmin
from scipy.stats import chisquare


obslist = [12, 51, 144, 268, 264, 166, 75, 18, 2] # histogram, 1000 observations
N = 1000 # no. of data points


def gauss(x, mu, sigma):
    return 1/(sigma * (2*pi)**(1/2)) * exp( (-1) * (x - mu)**2 / ( 2 * (sigma **2) ) )

def expvalperbin(mu, sigma):
    e = []
    # hard-coded bin boundaries
    for i in range(30, 75, 5):
        e.append(quad(gauss, i, i + 5, args=(mu, sigma))[0] * N)
    return e

def chisquareopt(args):
    # args[0] = mu
    # args[1] = sigma
    return chisquare(obslist, expvalperbin(args[0], args[1]))[0]

# initial guesses
initial_mu = 35.5
initial_sigma = 14

result = fmin(chisquareopt, [initial_mu, initial_sigma])

print(result)

Optimization terminated successfully.

Current function value: 2.010966

Iterations: 49

Function evaluations: 95

[ 50.57590239 7.01857529]

Btw., the obslist histogram is a 1000 point random sample from a N(50.5, 7.0) normal distribution. Remember that these are my very first Python code lines, so please don't judge me on the style. I just wanted to give you an idea about the general structure of the problem.

Stefan Zobel
  • 3,182
  • 7
  • 28
  • 38
  • In the function expvalperbin, what does 'args=(mu, sigma))[0] * N)' do? I'm guessing it copies a tuple of (mu,sigma) N times, but the subscript [0] leads me to believe I'm not seeing the full picture (similar for the args in 'chisquareopt')? As for not being pythonic, I'm open to suggestions. –  Feb 27 '17 at 20:09
  • 1
    It's the same as your `ans.append( quad( gauss, binbound[index], binbound[index+1])[0] )`. But I'm also passing `mu` and `lambda` to the `gauss` function. And finally, to get the expected **count** from the probability you have to multiply by `N`, the total number of observations (I told you that already). – Stefan Zobel Feb 27 '17 at 20:14
  • Ahhh, I see it now! Thank you for your help. –  Feb 27 '17 at 20:16
  • With your help, I was able to optimize my parameters. But out of curiousity, I read the docs before your guidance and was confused. Were you able to figure out the structure from the documentation alone, and if so, how? I'd like to be able to adapt my understanding to other module documentations going forward.. –  Feb 27 '17 at 22:34
  • As I said in your other [question](http://stackoverflow.com/questions/42465013/how-can-i-use-scipy-optimization-to-find-the-minimum-chi-squared-for-3-parameter) I don't know anything about Python/scipy. So the `args=(x)` problem came a little unexpected. But the parameters you'd need were always clear to me (you can check that by re-reading my comments there). I guess it's not so much about Python/scipy but rather of not having done (thinking about it) something like that before. I had my design verified in Java before you asked this question, so it was more or less a matter of translation. – Stefan Zobel Feb 27 '17 at 22:50