0

Due to the necessity of fitting a dataset that is related to a two dimensional diffusion process D2 process with a sestak berggren model (derived from logistic model) I needed to understand how to use the nlsLM when in presence of a elbow/knee because the following "easy way did not work"

x=c(1.000000e-05, 1.070144e-05, 1.208082e-05, 1.456624e-05, 1.861581e-05, 2.490437e-05, 3.407681e-05, 4.696710e-05,
 6.474653e-05, 8.870800e-05, 1.206194e-04, 1.624442e-04, 2.172716e-04, 2.882747e-04, 3.794489e-04, 4.956619e-04,
 6.427156e-04, 8.275095e-04, 1.058201e-03, 1.344372e-03, 1.697222e-03, 2.129762e-03, 2.657035e-03, 3.296215e-03,
 4.067301e-03, 4.992831e-03, 6.098367e-03, 7.412836e-03, 8.968747e-03, 1.080251e-02, 1.295471e-02, 1.547045e-02,
 1.839960e-02, 2.179713e-02, 2.572334e-02, 3.024414e-02, 3.543131e-02, 4.136262e-02, 4.812205e-02, 5.579985e-02,
 6.449256e-02, 7.430297e-02, 8.533991e-02, 9.771803e-02, 1.115573e-01, 1.269824e-01, 1.441219e-01, 1.631074e-01,
 1.840718e-01, 2.071477e-01, 2.324656e-01, 2.601509e-01, 2.903210e-01, 3.230812e-01, 3.585200e-01, 3.967033e-01,
 4.376671e-01, 4.814084e-01, 5.278744e-01, 5.769469e-01, 6.284244e-01, 6.819947e-01, 7.371982e-01, 7.933704e-01,
 8.495444e-01, 9.042616e-01)

ynorm=c(
 1.000000e+00, 8.350558e-01, 6.531870e-01, 4.910995e-01, 3.581158e-01, 2.553070e-01, 1.814526e-01, 1.290639e-01,
 9.219591e-02, 6.623776e-02, 4.817180e-02, 3.543117e-02, 2.624901e-02, 1.961542e-02, 1.478284e-02, 1.123060e-02,
 8.597996e-03, 6.631400e-03, 5.151026e-03, 4.028428e-03, 3.171096e-03, 2.511600e-03, 2.001394e-03, 1.604211e-03,
 1.292900e-03, 1.047529e-03, 8.530624e-04, 6.981015e-04, 5.739778e-04, 4.740553e-04, 3.932255e-04, 3.275345e-04,
 2.739059e-04, 2.299339e-04, 1.937278e-04, 1.637946e-04, 1.389500e-04, 1.182504e-04, 1.009406e-04, 8.641380e-05,
 7.418032e-05, 6.384353e-05, 5.508090e-05, 4.762920e-05, 4.127282e-05, 3.583451e-05, 3.116813e-05, 2.715264e-05,
 2.368759e-05, 2.068935e-05, 1.808802e-05, 1.582499e-05, 1.385102e-05, 1.212452e-05, 1.061032e-05, 9.278534e-06,
 8.103650e-06, 7.063789e-06, 6.140038e-06, 5.315870e-06, 4.576585e-06, 3.908678e-06, 3.298963e-06, 2.732866e-06,
 2.189810e-06, 1.614149e-06)


dfxy <-  data.frame(x[1:length(ynorm)],ynorm)
fn=funSel <-"co*((1-x)^m)*(x^n)"
mod_fit <- nlsLM(ynorm~eval(parse(text=fn)),start=c(co=0.5,m=-1,n=0.5),data=dfxy)
plot(dfxy$x,dfxy$y,xlim=c(0,0.001))
plot(dfxy$x,(fitted(mod_fit))[1:length(dfxy$x)],xlim=c(0,0.001))

The only solution I've found is based on https://stackoverflow.com/a/54286595/6483091. So first finding the "elbow" and then applying the regression only to the reduced dataset. Everything in this way works but I was wondering if there can be other solutions (tweaking the parameter of the regression instead of making it in two steps, in some way let nlsLM "recognize" the curve using Dynamic First Derivate Threshold, but still forcing the fn for regression) Also the "biggest problem is that I alredy know the "range" for the parameters" (i.e. Applying a regression using "good" starting point (coefficients near the "ground truth" ynorm <- 0.973*(1-x)^(0.425)*x^(-1.008) ) but even if I give them as a starting point there is no way I obtain anything with similar values.

the "ground truth"

 plot(x,yrnom) yt <- 0.973*(1-x)^(0.425)*x^(-1.008)
 lines(x,yt/max(yt))
Jojostack
  • 181
  • 9

1 Answers1

1

Here is a solution using nls and a hyperbolic fit:

x=c(1.000000e-05, 1.070144e-05, 1.208082e-05, 1.456624e-05, 1.861581e-05, 2.490437e-05, 3.407681e-05, 4.696710e-05,
    6.474653e-05, 8.870800e-05, 1.206194e-04, 1.624442e-04, 2.172716e-04, 2.882747e-04, 3.794489e-04, 4.956619e-04,
    6.427156e-04, 8.275095e-04, 1.058201e-03, 1.344372e-03, 1.697222e-03, 2.129762e-03, 2.657035e-03, 3.296215e-03,
    4.067301e-03, 4.992831e-03, 6.098367e-03, 7.412836e-03, 8.968747e-03, 1.080251e-02, 1.295471e-02, 1.547045e-02,
    1.839960e-02, 2.179713e-02, 2.572334e-02, 3.024414e-02, 3.543131e-02, 4.136262e-02, 4.812205e-02, 5.579985e-02,
    6.449256e-02, 7.430297e-02, 8.533991e-02, 9.771803e-02, 1.115573e-01, 1.269824e-01, 1.441219e-01, 1.631074e-01,
    1.840718e-01, 2.071477e-01, 2.324656e-01, 2.601509e-01, 2.903210e-01, 3.230812e-01, 3.585200e-01, 3.967033e-01,
    4.376671e-01, 4.814084e-01, 5.278744e-01, 5.769469e-01, 6.284244e-01, 6.819947e-01, 7.371982e-01, 7.933704e-01,
    8.495444e-01, 9.042616e-01)

ynorm=c(
  1.000000e+00, 8.350558e-01, 6.531870e-01, 4.910995e-01, 3.581158e-01, 2.553070e-01, 1.814526e-01, 1.290639e-01,
  9.219591e-02, 6.623776e-02, 4.817180e-02, 3.543117e-02, 2.624901e-02, 1.961542e-02, 1.478284e-02, 1.123060e-02,
  8.597996e-03, 6.631400e-03, 5.151026e-03, 4.028428e-03, 3.171096e-03, 2.511600e-03, 2.001394e-03, 1.604211e-03,
  1.292900e-03, 1.047529e-03, 8.530624e-04, 6.981015e-04, 5.739778e-04, 4.740553e-04, 3.932255e-04, 3.275345e-04,
  2.739059e-04, 2.299339e-04, 1.937278e-04, 1.637946e-04, 1.389500e-04, 1.182504e-04, 1.009406e-04, 8.641380e-05,
  7.418032e-05, 6.384353e-05, 5.508090e-05, 4.762920e-05, 4.127282e-05, 3.583451e-05, 3.116813e-05, 2.715264e-05,
  2.368759e-05, 2.068935e-05, 1.808802e-05, 1.582499e-05, 1.385102e-05, 1.212452e-05, 1.061032e-05, 9.278534e-06,
  8.103650e-06, 7.063789e-06, 6.140038e-06, 5.315870e-06, 4.576585e-06, 3.908678e-06, 3.298963e-06, 2.732866e-06,
  2.189810e-06, 1.614149e-06)


dfxy <-  data.frame(x[1:length(ynorm)],ynorm)
plot(ynorm ~ x.1.length.ynorm.., data = dfxy)
mod <- nls(ynorm ~ a/x.1.length.ynorm.. + b, data = dfxy, start = list(a = 1, b = 0))
lines(x = dfxy$x.1.length.ynorm.., y = predict(mod, newdata = dfxy$x.1.length.ynorm..))

The fit isn't perfect, though. I guess there is no continuous function to fit a right angle...

Depending on what you want to use the regression for, you could also use a loess regression:

dfxy <-  data.frame(x[1:length(ynorm)],ynorm)
names(dfxy) <- c("x", "y")
plot(y ~ x, data = dfxy)
mod <- loess(y ~ x, data = dfxy, span = 0.1)
lines(x = dfxy$x, y = predict(mod, newdata = dfxy$x), col = "red")

Resulting in:

loess regression

Manuel Popp
  • 1,003
  • 1
  • 10
  • 33