6

Is there any package that automatically fits a curve using many simple models?
By simple models I mean:

  • ax+b
  • ax^2+bx+c
  • a*log(x) + b
  • a*x^n+b
  • ax/(1+bx)
  • ax^n/(1+bx^n)
  • ...

The best would be to have a function that takes two vector parameters X and Y and returns a list of fitted simple models with their SSE.

Tomek Tarczynski
  • 2,785
  • 8
  • 37
  • 43
  • @Roland: To find best nonlinear transformation in linear regression. I like to make a categorical variable (decils for example) from continuous variables and then look at the plot of parameters for each decil Vs average value in each decil. That helps to find nonlinear transformation of a variable. I would like to speed up this process a bit. – Tomek Tarczynski Jul 05 '12 at 19:08

3 Answers3

10

Try this. rhs is a character vector of right sides and x and y are the data. It constructs the formula fo for each and then extracts the parameters and sets each to 1 for the starting value. Finally it runs nls and returns the SSEs sorted so that the result is a vector of SSE's named via the right hand sides. If verbose=TRUE (which it is by default) then it also displays the output from each fit.

sse <- function(rhs, x, y) sort(sapply(rhs, function(rhs, x, y, verbose = TRUE) {
    fo <- as.formula(paste("y", rhs, sep = "~"))
    nms <- setdiff(all.vars(fo), c("x", "y"))
    start <- as.list(setNames(rep(1, length(nms)), nms))
    fm <- nls(fo, data.frame(x, y), start = start)
    if (verbose) { print(fm); cat("---\n") }
    deviance(fm)
}, x = x, y = y))

## test

set.seed(123)
x <- 1:10
y <- rnorm(10, x)

# modify to suit
rhs <- c("a*x+b", "a*x*x+b*x+c")

sse(rhs, x, y)
G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
  • Funny outcome, the 2nd order polynomial always fits the best of first four models using multiple rnorm datasets. I assume this is because rnorm is only pseudorandom. But why it would be best approximated by a 2nd order polynomial every time is odd... – DWAHL Jul 06 '12 at 00:17
  • Three parameters will fit better than two. You can use AIC if you want to penalize more parameters or use SSE but just compare models with the same number of parameters. – G. Grothendieck Jul 06 '12 at 00:59
  • Thanks for showing how to create such functions. I was hoping that it already exists, because I can think about 30 simple models off the top of my head which means that there is probably more than 100 hundred such models. @ G. Grothendieck: Do You think that nls will give better results that nlminb? – Tomek Tarczynski Jul 06 '12 at 12:22
  • `nls` is specialized for nonlinear least squares whereas `nlminb` is for a general objective so one would normally try `nls` for this type of problem first. – G. Grothendieck Jul 06 '12 at 12:32
3

You could also look at packages providing functions to evaluate fractional polynomials. So far, these appear to be mboost (with the function FP) and mfp (with the function mfp). Although I haven't tried the packages, the theory behind them fits what you're after.

The mfp package was described in R-News in 2005.

Two references that might be of interest are

Royston P, Altman D (1994) Regression using fractional polynomials of continuous covariates. Appl Stat. 3: 429–467.

Sauerbrei W, Royston P (1999) Building multivariable prognostic and diagnostic models: transformation of the predictors by using fractional polynomials. Journal of the Royal Statistical Society (Series A) 162: 71–94.

BenBarnes
  • 19,114
  • 6
  • 56
  • 74
1

You can fit a Regression Splines and find a good fit by manually adjusting the degrees of freedom a few times. Try the following function:

spline.fit <- function(x, y, df=5) {
  ## INPUT: x, y are two vectors (predictor and response);
  ##        df is the number of spline basis.  Increase "df" to fit more adaptively to the data.
  require(splines) # available as default R Package.
  bx <- bs(x, df)  # B-spline basis matrix as New Predictors (dimension is "length(x)" by "df")
  f <- lm(y ~ bx)  # Linear Regression on Spline Basis (that is, "df" number of new predictors)
  fy <- fitted(f)  # Fitted Response
  plot(x, y); lines(x, fy, col="blue", lwd=2) # Make a plot to show the fit.
  invisible(list(x=bx, y=fy, f=f))    # Return the Basis (new predictors), Fitted Y, Regression
}

if (F) {                                # Unit Test
  spline.fit(1:100, rnorm(100))
  spline.fit(1:100, rnorm(100), df=20)
}
Feiming Chen
  • 119
  • 2
  • 3