20

I'm trying to fit a simple function to two arrays of independent data in python. I understand that I need to bunch the data for my independent variables into one array, but something still seems to be wrong with the way I'm passing variables when I try to do the fit. (There are a couple previous posts related to this one, but they haven't been much help.)

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

def fitFunc(x_3d, a, b, c, d):
    return a + b*x_3d[0,:] + c*x_3d[1,:] + d*x_3d[0,:]*x_3d[1,:]

x_3d = np.array([[1,2,3],[4,5,6]])

p0 = [5.11, 3.9, 5.3, 2]

fitParams, fitCovariances = curve_fit(fitFunc, x_3d[:2,:], x_3d[2,:], p0)
print ' fit coefficients:\n', fitParams

The error I get reads,

raise TypeError('Improper input: N=%s must not exceed M=%s' % (n, m)) 
TypeError: Improper input: N=4 must not exceed M=3

What is M the length of? Is N the length of p0? What am I doing wrong here?

lennon310
  • 12,503
  • 11
  • 43
  • 61
user3133865
  • 205
  • 1
  • 2
  • 4

2 Answers2

26

N and M are defined in the help for the function. N is the number of data points and M is the number of parameters. Your error therefore basically means you need at least as many data points as you have parameters, which makes perfect sense.

This code works for me:

import numpy as np
import matplotlib.pyplot as plt
from scipy.optimize import curve_fit

def fitFunc(x, a, b, c, d):
    return a + b*x[0] + c*x[1] + d*x[0]*x[1]

x_3d = np.array([[1,2,3,4,6],[4,5,6,7,8]])

p0 = [5.11, 3.9, 5.3, 2]

fitParams, fitCovariances = curve_fit(fitFunc, x_3d, x_3d[1,:], p0)
print ' fit coefficients:\n', fitParams

I have included more data. I have also changed fitFunc to be written in a form that scans as only being a function of a single x - the fitter will handle calling this for all the data points. The code as you posted also referenced x_3d[2,:], which was causing an error.

chthonicdaemon
  • 19,180
  • 2
  • 52
  • 66
  • 5
    Thanks a lot! (I think M is the number of data points, and N is the number of parameters.) – user3133865 Dec 27 '13 at 01:01
  • 1
    The help clearly states "ydata : N-length sequence" and "p0 : None, scalar, or M-length sequence", so N is the number of data points and M is the number of parameters. It seems like the error message has them backwards, though :-). If you think this answer was helpful, please consider accepting the answer. – chthonicdaemon Dec 27 '13 at 07:01
  • @VolodimirKopey I don't really see what that reply adds to this one - they seem very similar to me. – chthonicdaemon Nov 24 '14 at 03:03
  • @chthonicdaemon it seems that M and N have switched since you last posted a comment. Now ydata has length M and p0 can be a length-N sequence. – NeutronStar Dec 22 '14 at 18:59
  • 2
    @Joshua Yup, I submitted the [bug report and the fix](https://github.com/scipy/scipy/issues/3172). I suppose I should have updated this answer. – chthonicdaemon Dec 23 '14 at 07:39
0

The default curve_fit method needs you to have fewer parameters for the fitted function fitFunc than data points. I had the same problem fitting a function that took 15 parameters in total and I had only 13 data points. The solution is to use another method (e.g. dogbox or trf).

today
  • 32,602
  • 8
  • 95
  • 115