2

I have a set of points. Basically, I have P = f(t).

I have, let's say, 50 measurements. 50 values of P, function of the time. And these values follow an established law.

What I have to do is to find the values of the parameters in the law, that's all. Basically, I have to fit the points with the best curve. Here is the law:

P = V.t - ((V - W)(1 - exp(-k.t)) / k)

What I need to do is to find a numeric value for V, W and k. I have t and P. Do you have an idea about how to do that?

EDIT:

Here is a screenshot of what I want to obtain:

enter image description here

On the picture:

  • V is Vs
  • W is Vi
  • k is k

And that's what I obtained with reptilicus's help:

https://i.stack.imgur.com/ocvNl.png

import numpy as np
from scipy.optimize import curve_fit
from matplotlib.pyplot import *
import xlrd

def myFunc(t, V, W, k):

    y = V * t - ((V - W) * (1 - np.exp(-k * t)) / k)

    return y

classeur = xlrd.open_workbook(path)
names_sheets = classeur.sheet_names()

sheet = classeur.sheet_by_name(names_sheets[0])

row_start = 2

time = sheet.col_values(0, row_start)
fluo = feuille.col_values(4, row_start)

time = [ index for index in time if index ]
fluo = [ index for index in fluo if index ]

# this generates some fake data to fit. For youm just read in the 
# data in CSV or whatever you've
x = np.array(time)
y = np.array(fluo)

#fit the data, return the best fit parameters and the covariance matrix
#popt, pcov = curve_fit(myFunc, x, yn)
popt, pcov = curve_fit(myFunc, x, y)
print(popt)
print(pcov)

#plot the data
clf() #matplotlib
plot(x, y, "rs")
#overplot the best fit curve
plot(x, myFunc(x, popt[0], popt[1], popt[2]))
grid(True)
show()

Not bad. I managed to extract the data of my excel workbook, and plotted it. But as you can see, I got a linear regression, what I don't want. My goal is t reproduce the fit they got with Origin 8.

EDIT:

I have some news. The last guy who did that in my team told me how he did with Origin. In fact, they use the least square way as well, but they find the parameters with a chi 2 minimization. The software does some iterations, and optimizes the parameters.

EDIT 2:

Because it took me so long to figure it out, I share here the results of my researches. The main problem I was facing was the fact my values were "too small". Indeed, my y values were of the order of 10^-7. As explained here Fitting curve: why small numbers are better?, numbers of the order of 1 are better for the fit.

Moreover, in my case at least, as my data were of this order, I didn't need to give some initial parameters (by default, they are set to 1). So I just "normalized" my values. For example, I transformed the time values from seconds to hour, and I multiplicated by 10^7 my y values, which were of the order of 10^-7. Then, I transformed back the obtained parameters to get them in the desired unities. Here is my code:

import numpy as np
from scipy.optimize import curve_fit, leastsq
from matplotlib.pyplot import *

def myFunc(t, Vs, Vi, k):

    y = Vs * t - ((Vs - Vi) * (1 - np.exp(-k * t)) / k)

    return y

raw_x = some_input 
raw_y = some_input 

# scaling data
time = [ index /3600 for index in raw_x if index or index==0 ]
fluo = [ index*10**7 for index in raw_y if index or index==0 ]

x = np.array(temps)
y = np.array(fluo)

popt, pcov = curve_fit(myFunc, x, y, maxfev=3000)

# Good unities
popt2 = list()
popt2 = [ popt[0] / 3600 * 10**-7, popt[1] / 3600 * 10**-7, popt[2] / 3600 ]

#plot the data
clf() #matplotlib
plot(raw_x, raw_y, "rp")
plot(raw_x, myFunc(raw_x, popt2[0], popt2[1], popt2[2]), 'b')
grid(True)
show()

And here is a picture illustrating the difference:

https://i.stack.imgur.com/MRvJv.png

The blue plot is the fitting curve using the parameters obtained with the units rescaling (and transformed back in the good unities). The green one is the curve obtained by fitting in the original unities.

Thanks to all of you for your help.

Sнаđошƒаӽ
  • 16,753
  • 12
  • 73
  • 90
JPFrancoia
  • 4,866
  • 10
  • 43
  • 73

2 Answers2

6

Just use curve_fit in scipy.optimize:

import numpy as np
from scipy.optimize import curve_fit
from pylab import *

def myFunc(t, V, W, k):
    y = V * t - ((V - W) * (1 - np.exp(-k * t)) / k)
    return y

# this generates some fake data to fit. For youm just read in the 
# data in CSV or whatever you've
x = np.linspace(0,4,50)
y = myFunc(x, 2.5, 1.3, 0.5)
# add some noise to the fake data to make it more realistic. . .
yn = y + 0.2*np.random.normal(size=len(x))

#fit the data, return the best fit parameters and the covariance matrix
popt, pcov = curve_fit(myFunc, x, yn)
print popt
print pcov

#plot the data
clf()
plot(x, yn, "rs")
#overplot the best fit curve
plot(x, myFunc(x, popt[0], popt[1], popt[2]))
grid(True)
show()

This gives something like the plot below. The red points are the (noisy) data, and the blue line is the best fit curve, with the best fitting parameters for that particular data of:

[ 2.32751132, 1.27686053, 0.65986596]

Which is pretty close to the expected parameters of 2.5, 1.3, 0.5. The difference is due to the noise that I added in to the fake data.

example of fit

reptilicus
  • 10,290
  • 6
  • 55
  • 79
  • How is adding arbitrary noise justified here? I thought his measurements where already *real* (i.e. not the result of some toy calculation). – Benjamin Bannier Sep 04 '13 at 16:50
  • His measurements are, but in order to do a "realistic fit" without his data, you should add some noise to the generated data. – reptilicus Sep 04 '13 at 16:52
  • Ok, but in your example, can you get V, W and k used to draw the fited curve ? Because the points are supposed to follow this law. So with the fit function, you draw the "best" curve, i.e the one which satisfies the most the points. And then the point is to get the parameters. – JPFrancoia Sep 04 '13 at 17:40
  • @user1585507: as reptilicus explains in the comments, `curve_fit` returns the best fit parameters (`popt`) and the covariance matrix (`pcov`). As you can see from the "overplot the best fit curve" command, they're stored in `popt` in the order they're listed in `myFunc`, namely `V, W, k`. – DSM Sep 04 '13 at 17:56
  • Didn't see this part. Ok, I'll give it a try. Anyway, I already have a set of results verified and approved. So I'll see pretty soon if it worked. I'll keep you posted. But go ahead if you want to add something guy. For example, how does the curve_fit function works ? Is it the best one in my case ? – JPFrancoia Sep 04 '13 at 18:16
  • curve_fit uses the Levenberg-Marquardt algorithm to minimize the least squares between the actual data points (y) and the model (myFunc) by varying the parameters in the model. There are other fitters out there, mpfit is also a good one. – reptilicus Sep 04 '13 at 18:20
  • I edited my first post with my trial. As you can see on the second picture, I have a linear regression with your method. It might seems strange, but I really need to fit the first part of the curve. Vi (or W) and k are constants which characterize my system, and I will need them for further calculations. Is there an adapted filter ? – JPFrancoia Sep 04 '13 at 21:13
  • Are you sure that equation is correct, I could not get anything like your data with the equation given. If you have another free parameter in front like y=Q*t + - ((V - W) * (1 - np.exp(-k * t)) / k) then I could get data like yours – reptilicus Sep 05 '13 at 04:58
  • Nope, I don't any more parameters. But yes I'm sure about the equation. It is supposed to fit a Burst kinetic (for the study of an enzyme). Thought, the model is correct. But in the first picture I gave you, there are some parameters. I'm not completely sure about them, and as you can see, they have a large error range. What did you try to do ? – JPFrancoia Sep 05 '13 at 07:30
  • Yep, I edited again my first post. I have to do a chi 2 minimization to find the best parameters. – JPFrancoia Sep 05 '13 at 09:44
  • Are there error bars on the points? Least squares is the same thing as Chi squared, when the errors are uniform. – reptilicus Sep 05 '13 at 12:58
  • Something is not right. I just plotted the equation with the parameters from the fit in the picture you gave and it is not at all like the one you have. I think that the equation is not correct. – reptilicus Sep 05 '13 at 13:02
  • No, the equation is good for sure. http://i.imgur.com/Ybuna50.png. The parameters might not be good, but the model is good. – JPFrancoia Sep 05 '13 at 16:53
  • Well if you plug the parameters from your fit into that equation you do not get the red line in your chart. – reptilicus Sep 05 '13 at 17:23
  • I know :/ But isn't it expected, with the large error range ? – JPFrancoia Sep 05 '13 at 17:50
  • No, given the best fit parameters of V=2.55E-11, W=0.01241, k=19765.3 you should get the red line in your plot, if Origin is actually doing it right. – reptilicus Sep 05 '13 at 18:04
  • Ok. Let's assume Origin didn't do it right. If I give you a set of data, could you try to find the good parameters ? – JPFrancoia Sep 05 '13 at 18:16
  • I would but I am swamped at the moment. If you are still stuck tomorrow let me know. – reptilicus Sep 05 '13 at 20:20
0

Since you have more points than unknown coefficients, you'll want to do a least squares fit. This calculation will give you the coefficients that minimize the mean square error between the function and all the points.

So substitute your 50 points into the assumed equation and get 50 equations for 3 coefficients. This is easily expressed as a matrix:

Ax = b

where the unknown vector x are your coefficients.

Premultiply both sides by the transpose of A and solve using a matrix solver.

Start by plotting the data you have. The choice of equation you start with will make your job easier or harder. Are you certain about that leading term? If that wasn't appropriate, you could take the natural log of both sides and it's a simple linear equation. The leading term makes that harder to tease out the coefficient for k. It's a nonlinear solution in that case. Choose wisely.

duffymo
  • 305,152
  • 44
  • 369
  • 561
  • Not for that one, no. But I'm not confident in his choice without seeing the data. – duffymo Sep 04 '13 at 16:38
  • If he had uncertainties on the points he could do a χ² fit which is very similar to a least square fit implementationwise. – Benjamin Bannier Sep 04 '13 at 16:46
  • 1
    Let's keep this simple. This is a person who had to ask to do a least squares fit. – duffymo Sep 04 '13 at 16:49
  • To answer your question, yes I'm sure about the leading term. It comes from a scientific publication, and the data are experimental values. My team used Origin 8 to fit the data, but I don't know which way the software uses. I'll add a picture of a graph obtained by this way. – JPFrancoia Sep 04 '13 at 17:49