2

The function below calculates binned averages, sizes the bin points on the graph relative to the number of observations in each bin, and plots a lowess line through the bin means. Instead of plotting the lowess line through the bin means, however, I would like to plot the line through the original dataset so that the error bands on the lowess line represent the uncertainty in the actual dataset, not the uncertainty in the binned averages. How do I modify geom_smooth() so that it will plot the line using df instead of dfplot?

library(fields)
library(ggplot2)

binplot <- function(df, yvar, xvar, sub = FALSE, N = 50, size = 40, xlabel = "X", ylabel = "Y"){
  if(sub != FALSE){
    df <- subset(df, eval(parse(text = sub)))

  }

  out <- stats.bin(df[,xvar], df[,yvar], N= N)
  x <- out$centers
  y <- out$stats[ c("mean"),]
  n <-  out$stats[ c("N"),] 
  dfplot <- as.data.frame(cbind(x,y,n))

  if(size != FALSE){
    sizes <- n * (size/max(n))

  }else{
    sizes = 3
  }

    ggplot(dfplot, aes(x,y)) +
      xlab(xlabel) +
      ylab(ylabel) +
      geom_point(shape=1, size = sizes) +
      geom_smooth() 
}

Here is a reproducible example that demonstrates how the function currently works:

sampleSize <- 10000
x1 <- rnorm(n=sampleSize, mean = 0, sd = 4)
y1 <-  x1 * 2 + x1^2 * .3 +  rnorm(n=sampleSize, mean = 5, sd = 10)
binplot(data.frame(x1,y1), "y1", "x1", N = 25)

enter image description here

As you can see, the error band on the lowess line reflects the uncertainty if each bin had an equal number of observations, but they do not. The bins at the extremes have far fewer obseverations (as illustrated by the size of the points) and the lowess line's error band should reflect that.

MrFlick
  • 195,160
  • 17
  • 277
  • 295
Michael
  • 13,244
  • 23
  • 67
  • 115
  • 2
    `?geom_smooth`, the second argument is `data`. Have you tried specifying `data = df`? – Gregor Thomas May 13 '15 at 17:50
  • 1
    Please make a [reproducible example](http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) with sample input data so we can run the code. – MrFlick May 13 '15 at 17:56
  • @Gregor that yields `Error in eval(expr, envir, enclos) : object 'x' not found` I've looked at the ?geom_smooth and ?ggplot pages. This may be a completely noob question, seems like it should be straightforward, but I can't figure it out – Michael May 13 '15 at 18:16
  • @MrFlick just added a reproducible example of how it currently works. I tried adding one before I posted but all of R built-in datasets I tried were too small for binning. – Michael May 13 '15 at 18:17

1 Answers1

2

You can explicitly set the data= parameter for each layer. You will also need to change the aesthetic mapping since the original data.frame had different column names. Just change your geom_smooth call to

geom_smooth(data=df, aes_string(xvar, yvar)) 

with the sample data, this returned

enter image description here

MrFlick
  • 195,160
  • 17
  • 277
  • 295