13

I have a simple dataset and I am trying to use the power trend to best fit the data. The sample data is very small and is as follows:

structure(list(Discharge = c(250, 300, 500, 700, 900), Downstream = c(0.3, 
0.3, 0.3, 0.3, 0.3), Age = c(1.32026239202165, 1.08595138888889, 
0.638899189814815, 0.455364583333333, 0.355935185185185)), .Names = c("Discharge", 
"Downstream", "Age"), row.names = c(NA, 5L), class = "data.frame")

Data looks as follows:

> new
  Discharge Downstream       Age
1       250        0.3 1.3202624
2       300        0.3 1.0859514
3       500        0.3 0.6388992
4       700        0.3 0.4553646
5       900        0.3 0.3559352

I tried to plot the above data using ggplot2

ggplot(new)+geom_point(aes(x=Discharge,y=Age))

I could add the linear line using geom_smooth(method="lm") but I am not sure what code do I need to show the power line.

The output is as follows:

enter image description here

How Can I add a power linear regression line as done in excel ? The excel figure is shown below:

enter image description here

Jd Baba
  • 5,948
  • 18
  • 62
  • 96

3 Answers3

18

While mnel's answer is correct for a nonlinear least squares fit, note that Excel isn't actually doing anything nearly that sophisticated. It's really just log-transforming the response and predictor variables, and doing an ordinary (linear) least squares fit. To reproduce this in R, you would do:

lm(log(Age) ~ log(Discharge), data=df)

Call:
lm(formula = log(Age) ~ log(Discharge), data = df)

Coefficients:
   (Intercept)  log(Discharge)  
         5.927          -1.024  

As a check, the coefficient for log(Discharge) is identical to that from Excel while exp(5.927) ~ 375.05.

While I'm not sure how to use this as a trendline in ggplot2, you can do it in base graphics thusly:

m <- lm(log(y) ~ log(x), data=df)

newdf <- data.frame(Discharge=seq(min(df$Discharge), max(df$Discharge), len=100))
plot(Age ~ Discharge, data=df)
lines(newdf$Discharge, exp(predict(m, newdf)))

text(600, .8, substitute(b0*x^b1, list(b0=exp(coef(m)[1]), b1=coef(m)[2])))
text(600, .75, substitute(plain("R-square: ") * r2, list(r2=summary(m)$r.squared)))
Hong Ooi
  • 56,353
  • 13
  • 134
  • 187
  • Indeed -- Hence my reticence posting the `r^2` value - – mnel Aug 19 '13 at 10:28
  • 1
    Using `ggplot2` this can be fitted using `stat_smooth(method = 'glm', formula = 'y~log(x)', family = gaussian(link = 'log')` – mnel Aug 19 '13 at 23:47
  • @mnel That's not quite the same though. That fits `log(E(Y)) = log(x)`, as opposed to `E(log(Y)) = log(x)`. I don't think there's a way to fit a log-transform regression as a smooth in ggplot2 -- which would make sense, since the regression is biased on the original scale. – Hong Ooi Aug 20 '13 at 04:28
  • 2
    Good point. Perhaps `scale_x_continuous(trans = 'log') + scale_y_continuous(trans = 'log') + stat_smooth(method = 'lm', se = FALSE) + coord_trans(ytrans = 'exp', xtrans = 'exp')` (this logs the data and fits the smooth (linear) to the transformed data, then inverts the transformation for plotting. – mnel Aug 20 '13 at 05:22
  • 1
    Minor change to @mnel answer for the current version of ggplot. It should generate the ggplot curve that matches excel: `ggplot(df,aes(x,y)) + geom_point() + scale_x_continuous(trans = 'log') + scale_y_continuous(trans = 'log') + stat_smooth(method = 'lm', se = FALSE) + coord_trans(y = 'exp', x = 'exp')` – Abhishek R Mar 19 '20 at 12:02
11

Use nls (nonlinear least squares) as your smoother

eg

ggplot(DD,aes(x = Discharge,y = Age)) +
  geom_point() + 
  stat_smooth(method = 'nls', formula = 'y~a*x^b', start = list(a = 1,b=1),se=FALSE)

Noting Doug Bates comments on R-squared values and non-linear models here, you could use the ideas in Adding Regression Line Equation and R2 on graph

to append the regression line equation

# note that you have to give it sensible starting values
# and I haven't worked out why the values passed to geom_smooth work!
power_eqn = function(df, start = list(a =300,b=1)){
  m = nls(Discharge ~ a*Age^b, start = start, data = df);
  eq <- substitute(italic(y) == a  ~italic(x)^b, 
               list(a = format(coef(m)[1], digits = 2), 
                    b = format(coef(m)[2], digits = 2)))
  as.character(as.expression(eq));                 
}

ggplot(DD,aes(x = Discharge,y = Age)) +
  geom_point() + 
  stat_smooth(method = 'nls', formula = 'y~a*x^b', start = list(a = 1,b=1),se=FALSE) +  
  geom_text(x = 600, y = 1, label = power_eqn(DD), parse = TRUE)
Community
  • 1
  • 1
mnel
  • 113,303
  • 27
  • 265
  • 254
  • Thank you so much for your quick answer. Would you also suggest how can I fit a non linear model outside `ggplot` to post the values of the `R2` and the equation in the graph ? – Jd Baba Aug 19 '13 at 03:24
  • I used the following formula `nlm <- nls(Age ~ a*Discharge^b,data=new,start=list(a=1,b=1))` . Would you let me know whether it is correct or not ? I cannot seem to find the `R-squared` using the formula. – Jd Baba Aug 19 '13 at 03:37
  • @Jdbaba -- you need sensible starting values (try a = 300), I haven't worked out why (a=1) doesn't throw the same error in `geom_smooth` (very strange). I've added a link regarding `R^2` and non linear least squares. – mnel Aug 19 '13 at 03:45
7

2018 Update: The call "start" now seems to be depreciated. It is not in the stat_smooth function information either.

If you want to choose starting values, you need to use "method.args" option now.

See changes below:

ggplot(DD,aes(x = Discharge,y = Age)) +
  geom_point() + 
  stat_smooth(method = 'nls', formula = 'y~a*x^b', method.args = list(start= c(a = 1,b=1)),se=FALSE) + geom_text(x = 600, y = 1, label = power_eqn(DD), parse = TRUE)