2

Edit:

x = c(324, 219, 406, 273, 406, 406, 406, 406, 406, 168, 406, 273, 168, 406, 273, 168, 219, 324, 324, 406, 406, 406, 273, 273, 324, 324, 219, 273, 219, 273, 273, 324, 273, 324, 324, 406, 219, 406, 273, 273, 406, 219, 324, 273, 324, 406, 219, 324, 219, 324, 324, 406, 406, 406, 324, 273, 273, 219, 219, 324, 273, 324, 324, 219, 324, 219, 324, 219, 219, 324, 273, 406, 406, 273, 324, 273, 273, 219, 406, 273, 273, 324, 324, 324, 324, 324, 406, 324, 273, 406, 406, 219, 219, 324, 273, 406, 324, 324, 324, 324) 
y = c(68,121,NA,87,NA,17,20,15,17,146,25,91,141,24,88,143,120,63,62,16,21,20,83,88,65,63,124,88,120,91,85,65,91,63,69,23,115,23,87,90,20,120,65,90,65,20,120,60,110,60,17,20,20,20,68,80,87,124,121,65,85,67,60,115,60,120,66,121,117,68,90,17,23,90,61,80,88,121,NA,91,88,62,60,70,60,60,27,76,96,23,20,113,118,60,91,23,60,60,65,70)

data = data.frame(x,y)

I create the following graphic with ggplot2 and the function geom_smooth(). I used the code:

g = ggplot(data, aes(x,y)) + 
  geom_point() + 
  geom_smooth(method="loess") + 
  geom_smooth(method="lm", col="red")

My data contains variables x (has got only 9 values) and y (metrical). Now I want to add the projection points of the loess method calculated with the code:

loes = loess(data$y ~ data$x)
RR = sort(unique(predict(loes)), decreasing=TRUE) # y coordinates
LL = unique(x, fromLast=TRUE) # x coordinates

Now I add these projection points to my plot.

  g + geom_point(aes(y=RR[1], x=LL[1]), col="blue", size=2, shape=18) + 
  geom_point(aes(y=RR[2], x=LL[2]), col="blue", size=2, shape=18) +
  geom_point(aes(y=RR[3], x=LL[3]), col="blue", size=2, shape=18) +
  geom_point(aes(y=RR[4], x=LL[4]), col="blue", size=2, shape=18) +
  geom_point(aes(y=RR[5], x=LL[5]), col="blue", size=2, shape=18) 

Why are the blue points not on the blue loess-line in ggplot? Is the used code for the loess-method different from the standard loess-function in R?

Info: For my original data with more than 8.000 observations there are no pseudoinverse-warnings, but the problem is the same.

Example Image

T. Beige
  • 177
  • 12
  • [How to make a great R reproducible example?](http://stackoverflow.com/questions/5963269) – Axeman Sep 30 '16 at 08:17
  • @Axeman: My dataset ihas got about 8.000 observations and I'm not allowed to publish data. If I add just a few the blue points seems to be on the line. If you have a suggestion to make it clearer, please let me know. – T. Beige Sep 30 '16 at 08:48
  • Did you read the link? It discusses a whole bunch of strategies to give a reproducible example without sharing (all of) our data. Does this occur if you use one the examples in `?geom_smooth`? – Axeman Sep 30 '16 at 08:52
  • -1 @Axeman: You don't have to downgrade it immediately, but ok. Now it should be reproducable, eventhough of singularity problems, which should not be the problem here. – T. Beige Sep 30 '16 at 09:27
  • The help of `geom_smooth` says that it uses the default settings of `loess`, so the points should be on the blue line. – T. Beige Sep 30 '16 at 09:32

1 Answers1

5

The error is in these lines:

loes = loess(y ~ x, data = data)
RR = sort(unique(predict(loes)), decreasing=TRUE) # y coordinates
LL = unique(x, fromLast=TRUE) # x coordinates

The prediction is made using the same function, but out of order. You should use newdata to appropriately match the prediction with the predictors.

g = ggplot(data, aes(x,y)) + 
  geom_smooth(method="loess", color = "red") 

RR <- predict(loes, newdata = data.frame(x = unique(x)))

g + annotate("point", x = unique(x), y = RR)

Shows the points lying on the smoothed line: enter image description here

Hugh
  • 15,521
  • 12
  • 57
  • 100