4

I have scatterplot of two variables, for instance this:

x<-c(0.108,0.111,0.113,0.116,0.118,0.121,0.123,0.126,0.128,0.131,0.133,0.136)

y<-c(-6.908,-6.620,-5.681,-5.165,-4.690,-4.646,-3.979,-3.755,-3.564,-3.558,-3.272,-3.073)

and I would like to find the function that better fits the relation between these two variables.

to be precise I would like to compare the fitting of three models: linear, exponential and logarithmic.

I was thinking about fitting each function to my values, calculate the likelihoods in each case and compare the AIC values.

But I don't really know how or where to start. Any possible help about this would be extremely appreciated.

Thank you very much in advance.

Tina.

N8TRO
  • 3,348
  • 3
  • 22
  • 40
user18441
  • 643
  • 1
  • 7
  • 15
  • Have you tried symbolic regression with the `rgp` package? If you include some sample data we can try it out. More details here: http://www.rsymbolic.org/projects/rgp/wiki/Symbolic_Regression – Ben Feb 23 '13 at 16:07
  • 2
    How basic do we have to go here? Have you read the data in? Have you done any exploratory plots? Do you at least know how to fit a linear model with the `lm` package? We're kinda stuck for level without a bit more... – Spacedman Feb 23 '13 at 16:26
  • thank you very much, I have added an example, I know pretty much the basics in R, but I am new when it comes to fitting models more complex than a regression. – user18441 Feb 23 '13 at 17:00

3 Answers3

7

I would begin by an explantory plots, something like this :

x<-c(0.108,0.111,0.113,0.116,0.118,0.121,0.123,0.126,0.128,0.131,0.133,0.136)
y<-c(-6.908,-6.620,-5.681,-5.165,-4.690,-4.646,-3.979,-3.755,-3.564,-3.558,-3.272,-3.073)
dat <- data.frame(y=y,x=x)
library(latticeExtra)
library(grid)
xyplot(y ~ x,data=dat,par.settings = ggplot2like(),
       panel = function(x,y,...){
         panel.xyplot(x,y,...)
       })+
  layer(panel.smoother(y ~ x, method = "lm"), style =1)+  ## linear
  layer(panel.smoother(y ~ poly(x, 3), method = "lm"), style = 2)+  ## cubic
  layer(panel.smoother(y ~ x, span = 0.9),style=3)  + ### loeess
  layer(panel.smoother(y ~ log(x), method = "lm"), style = 4)  ## log

enter image description here

looks like you need a cubic model.

 summary(lm(y~poly(x,3),data=dat))

Residual standard error: 0.1966 on 8 degrees of freedom
Multiple R-squared: 0.9831, Adjusted R-squared: 0.9767 
F-statistic: 154.8 on 3 and 8 DF,  p-value: 2.013e-07 
agstudy
  • 119,832
  • 17
  • 199
  • 261
  • +1 that's very good, how about the AIC values? A method for exploring the smoothers in `ggplot` is here: http://www.ats.ucla.edu/stat/r/faq/smooths.htm – Ben Feb 23 '13 at 18:33
  • thank you very much, I have problems to install the grid package, I guess it's this one you mean: http://www.stat.auckland.ac.nz/~paul/grid/grid.html (I have a mac). – user18441 Feb 23 '13 at 18:44
  • Yes. grid of Paul murrell(bless him).No need to install it , just load it , it is distributed with R like it is mentioned in the link you give. – agstudy Feb 23 '13 at 18:48
5

Here is an example of comparing five models. Due to the form of the first two models we are able to use lm to get good starting values. (Note that models using different transforms of y should not be compared so we should not use lm1 and lm2 as comparison models but only for starting values.) Now run an nls for each of the first two. After these two models we try polynomials of various degrees in x. Fortunately lm and nls use consistent AIC definitions (although its not necessarily true that other R model fitting functions have consistent AIC definitions) so we can just use lm for the polynomials. Finally we plot the data and fits of the first two models.

The lower the AIC the better so nls1 is best followed by lm3.2 following by nls2 .

lm1 <- lm(1/y ~ x)
nls1 <- nls(y ~ 1/(a + b*x), start = setNames(coef(lm1), c("a", "b")))
AIC(nls1) # -2.390924

lm2 <- lm(1/y ~ log(x))
nls2 <- nls(y ~ 1/(a + b*log(x)), start = setNames(coef(lm2), c("a", "b")))
AIC(nls2) # -1.29101

lm3.1 <- lm(y ~ x) 
AIC(lm3.1) # 13.43161

lm3.2 <- lm(y ~ poly(x, 2))
AIC(lm3.2) # -1.525982

lm3.3 <- lm(y ~ poly(x, 3))
AIC(lm3.3) # 0.1498972

plot(y ~ x)

lines(fitted(nls1) ~ x, lty = 1) # solid line
lines(fitted(nls2) ~ x, lty = 2) # dashed line

enter image description here

ADDED a few more models and subsequently fixed them up and changed notation. Also to follow up on Ben Bolker's comment we can replace AIC everywhere above with AICc from the AICcmodavg package.

G. Grothendieck
  • 254,981
  • 17
  • 203
  • 341
0

You could start by reading the classic paper by Box and Cox on transformations. They discuss how to compare transformations and how to find meaningful transformations within a set or family of potential transforms. The log transform and linear model are special cases of the Box-Cox family.

And as @agstudy said, always plot the data as well.

Greg Snow
  • 48,497
  • 6
  • 83
  • 110