2

I'm trying to do leave one out cross validation of non linear regression and plot the optimal fit. I feel like my loocv and plot functions are totally wrong. Could anybody clarify what I'm doing wrong?

data(Boston, package='MASS')
y <- Boston$nox
x <- Boston$dis
n <- length(x)
nla <- n
las <- seq(0, .85, length=nla)
cvs <- rep(0, nla)
for(j in 1:nla) {
  prs <- rep(0,n)
  for(i in 1:n) {
    yi <- y[-i]
    xi <- x[-i]
    d <- nls(y~ A + B * exp(C * x), start=list(A=0.5, B=0.5, C=-0.5))
    prs[i] <- predict(d, newdata=data.frame(xi=x[i]))
  }
 cvs[j] <- mean( (y - prs)^2 )
}
cvs[j]
plot(y~x, pch=19, col='gray', cex=1.5,xlab='dis', ylab='nox')
d <- nls(y~ A + B * exp(C * x), start=list(A=0.5, B=0.5, C=-0.5))
lines(predict(d)[order(x)]~sort(x), lwd=4, col='black')
fotNelton
  • 3,844
  • 2
  • 24
  • 35

1 Answers1

1

You seem to have been close, but in your loop you were still calling the full set of data x and y. As far as I can tell, you only need a single loop to fit the model to each leave-one-out scenario. Thus, I can't see the need for the variables las nor prs. For reference, the plot shows the leave-one-out mean squared error (LOO MSE) and the mean squared error of the residuals (MSE) for the nls model fit to the full data set.

Script:

require(MASS)
data(Boston, package='MASS')
y <- Boston$nox
x <- Boston$dis
n <- length(x)

cvs <- rep(0, n)
for(j in seq(n)){
  ys <- y[-j]
  xs <- x[-j]
  d <- nls(ys ~ A + B * exp(C * xs), start=list(A=0.5, B=0.5, C=-0.5))
  cvs[j] <- (y[j] - predict(d, data.frame(xs=x[j])))^2
  print(paste0(j, " of ", n, " finished (", round(j/n*100), "%)"))
}

plot(y~x, pch=19, col='gray', cex=1.5, xlab='dis', ylab='nox')
d <- nls(y~ A + B * exp(C * x), start=list(A=0.5, B=0.5, C=-0.5))
lines(predict(d)[order(x)]~sort(x), lwd=4, col='black')
usr <- par("usr")
text(usr[1] + 0.9*(usr[2]-usr[1]), usr[3] + 0.9*(usr[4]-usr[3]), paste("LOO MSE", "=", round(mean(cvs), 5)), pos=2)
text(usr[1] + 0.9*(usr[2]-usr[1]), usr[3] + 0.8*(usr[4]-usr[3]), paste("MSE", "=", round(mean(resid(d)^2), 5)), pos=2)

enter image description here

Community
  • 1
  • 1
Marc in the box
  • 11,769
  • 4
  • 47
  • 97