Why my fit for a logarithm function looks so wrong

Question

I'm plotting this dataset and making a logarithmic fit, but, for some reason, the fit seems to be strongly wrong, at some point I got a good enough fit, but then I re ploted and there were that bad fit. At the very beginning there were a 0.0 0.0076 but I changed that to 0.001 0.0076 to avoid the asymptote.

I'm using (not exactly this one for the image above but now I'm testing with this one and there is that bad fit as well) this for the fit

f(x) = a*log(k*x + b)
fit = fit f(x) 'R_B/R_B.txt' via a, k, b

And the output is this

Also, sometimes it says 7 iterations were as is the case shown in the screenshot above, others only 1, and when it did the "correct" fit, it did like 35 iterations or something and got a = 32 if I remember correctly

Edit: here is again the good one, the plot I got is this one. And again, I re ploted and get that weird fit. It's curious that if there is the 0.0 0.0076 when the good fit it's about to be shown, gnuplot says "Undefined value during function evaluation", but that message is not shown when I'm getting the bad one.

Do you know why do I keep getting this inconsistence? Thanks for your help

Since you didn't give your data (numerical) I got an approximative data in scanning your given graph (which is not accurate). Nevertheless the fitting was not wrong at all but rather good. Without knowing what method of regression you used and without the details of your calculus I cannot say why you got it wrong. — JJacquelin, Apr 16 '22 at 08:09

JJacquelin · Accepted Answer · 2022-04-18T08:25:45.643

As I already mentioned in comments the method of fitting antiderivatives is much better than fitting derivatives because the numerical calculus of derivatives is strongly scattered when the data is slightly scatered.

The principle of the method of fitting an integral equation (obtained from the original equation to be fitted) is explained in https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales . The application to the case of y=a.ln(c.x+b) is shown below.

Numerical calculus :

In order to get even better result (according to some specified criteria of fitting) one can use the above values of the parameters as initial values for iterarive method of nonlinear regression implemented in some convenient software.

NOTE : The integral equation used in the present case is :

NOTE : On the above figure one can compare the result with the method of fitting an integral equation to the result with the method of fitting with derivatives.

Acknowledgements : Alex Sveshnikov did a very good work in applying the method of regression with derivatives. This allows an interesting and enlightening comparison. If the goal is only to compute approximative values of parameters to be used in nonlinear regression software both methods are quite equivalent. Nevertheless the method with integral equation appears preferable in case of scattered data.

UPDATE (After Alex Sveshnikov updated his answer)

The figure below was drawn in using the Alex Sveshnikov's result with further iterative method of fitting.

The two curves are almost indistinguishable. This shows that (in the present case) the method of fitting the integral equation is almost sufficient without further treatment.

Of course this not always so satisfying. This is due to the low scatter of the data.

In ADDITION , answer to a question raised in comments by CosmeticMichu :

Hi JJacquelin, I was checking your pub but there's something I don't get, how that 3x3 matrix is made? There in your pub is a 2x2 matrix, and it gives 2 of the values involved with the function. I was doing the calculus for the integral and get what u put above, I realized there are 4 terms when I write the quadratic deviation equation so I thought it could be related with my question (in your pub, 3 terms and get a 2x2 matrix, here 4 terms and get a 3x3) I tried all day but really I didn't get it. Srry if I'm annoying, can u help me? With some bib at least. Thank u very much in advance — CosmeticMichu, Apr 18 '22 at 04:39
@CosmeticMichu. See the explanation in addition at end of my main answer. — JJacquelin, Apr 18 '22 at 08:27

Alex Sveshnikov · Answer 2 · 2022-04-17T09:13:26.677

The problem here is that the fit algorithm starts with "wrong" approximations for parameters a, k, and b, so during the minimalization it finds a local minimum, not the global one. You can improve the result if you provide the algorithm with starting values, which are close to the optimal ones. For example, let's start with the following parameters:

gnuplot> a=47.5087
gnuplot> k=0.226
gnuplot> b=1.0016
gnuplot> f(x)=a*log(k*x+b)
gnuplot> fit f(x) 'R_B.txt' via a,k,b
....
....
....
After 40 iterations the fit converged.
final sum of squares of residuals : 16.2185
rel. change during last iteration : -7.6943e-06

degrees of freedom    (FIT_NDF)                        : 18
rms of residuals      (FIT_STDFIT) = sqrt(WSSR/ndf)    : 0.949225
variance of residuals (reduced chisquare) = WSSR/ndf   : 0.901027

Final set of parameters            Asymptotic Standard Error
=======================            ==========================
a               = 35.0415          +/- 2.302        (6.57%)
k               = 0.372381         +/- 0.0461       (12.38%)
b               = 1.07012          +/- 0.02016      (1.884%)

correlation matrix of the fit parameters:
                a      k      b      
a               1.000 
k              -0.994  1.000 
b               0.467 -0.531  1.000

The resulting plot is

Now the question is how you can find "good" initial approximations for your parameters? Well, you start with

If you differentiate this equation you get

or

The left-hand side of this equation is some constant 'C', so the expression in the right-hand side should be equal to this constant as well:

In other words, the reciprocal of the derivative of your data should be approximated by a linear function. So, from your data x[i], y[i] you can construct the reciprocal derivatives x[i], (x[i+1]-x[i])/(y[i+1]-y[i]) and the linear fit of these data:

The fit gives the following values:

C*k = 0.0236179
C*b = 0.106268

Now, we need to find the values for a, and C. Let's say, that we want the resulting graph to pass close to the starting and the ending point of our dataset. That means, that we want

a*log(k*x1 + b) = y1
a*log(k*xn + b) = yn

Thus,

a*log((C*k*x1 + C*b)/C) = a*log(C*k*x1 + C*b) - a*log(C) = y1
a*log((C*k*xn + C*b)/C) = a*log(C*k*xn + C*b) - a*log(C) = yn

By subtracting the equations we get the value for a:

a = (yn-y1)/log((C*k*xn + C*b)/(C*k*x1 + C*b)) = 47.51

Then,

log(k*x1+b) = y1/a
k*x1+b = exp(y1/a)
C*k*x1+C*b = C*exp(y1/a)

From this we can calculate C:

C = (C*k*x1+C*b)/exp(y1/a)

and finally find the k and b:

k=0.226
b=1.0016

These are the values used above for finding the better fit.

UPDATE

You can automate the process described above with the following script:

# Name of the file with the data
data='R_B.txt'

# The coordinates of the last data point
xn=NaN
yn=NaN

# The temporary coordinates of a data point used to calculate a derivative
x0=NaN
y0=NaN

linearFit(x)=Ck*x+Cb
fit linearFit(x) data using (xn=$1,dx=$1-x0,x0=$1,$1):(yn=$2,dy=$2-y0,y0=$2,dx/dy) via Ck, Cb

# The coordinates of the first data point
x1=NaN
y1=NaN
plot data using (x1=$1):(y1=$2) every ::0::0

a=(yn-y1)/log((Ck*xn+Cb)/(Ck*x1+Cb))
C=(Ck*x1+Cb)/exp(y1/a)
k=Ck/C
b=Cb/C

f(x)=a*log(k*x+b)
fit f(x) data via a,k,b

plot data, f(x)

pause -1

Hi, thank you very much, I'm trying to write something that automatically does that every time, but I have a problem now, can I know how did you do to get those x[i], (x[i+1]-x[i])/(y[i+1]-y[i]) from the original data? I mean, is there anything like a loop that, having x[i],y[i] gets the data for the derivative? Or did you did it manually? — CosmeticMichu, Apr 17 '22 at 00:43
The method of fitting derivatives is valid in some cases. But the deviation becomes big when the data is too scattered. It is much better to fit antiderivatives instead of derivatives. Numerical integration to compute the antiderivatives is much more stable. See explanation and examples in https://fr.scribd.com/doc/14674814/Regressions-et-equations-integrales . — JJacquelin, Apr 17 '22 at 08:17
I have updated the answer and added a script which can help with processing of multiple data. However, it does not guarantee that the fit will be good for other data, selection of a good first approximation is always tricky. — Alex Sveshnikov, Apr 17 '22 at 09:08

Why my fit for a logarithm function looks so wrong

2 Answers2