2

I was plotting a regression line using geom_smooth() alias from ggplot2 as below:

library(ggplot2)

X <- c(3, 7, 10, 14, 15, 11, 13, 18)
Y <- c(11, 5, 3, 9, 7, 5, 2, 1)

ggplot(data = data.frame(X, Y), aes(x = X, y = Y)) + theme_bw() +
  geom_point(col = 'red')  + geom_smooth(method = 'lm', formula = Y ~ X)

Surprisingly, I got a warning message as:

Warning messages:
1: 'newdata' had 80 rows but variables found have 8 rows 
2: In base::data.frame(x = xseq, fit, se = pred$se.fit) :
  row names were found from a short variable and have been discarded

With the following graph:

enter image description here

I was expecting a linear graph (a straight line) as:

enter image description here

which I found by correcting the R codes to:

library(ggplot2)

x <- c(3, 7, 10, 14, 15, 11, 13, 18)
y <- c(11, 5, 3, 9, 7, 5, 2, 1)

ggplot(data = data.frame(x, y), aes(x = x, y = y)) + theme_bw() +
  geom_point(col = 'red')  + geom_smooth(method = 'lm', formula = y ~ x)

Why did it happen? What is happening if we write that formula in uppercase?

M--
  • 25,431
  • 8
  • 61
  • 93
shubh
  • 111
  • 5
  • Did some testing... problem arises with ```formula = Y ~ X```. Could be some defaults, but not sure yet. – M-- Jan 19 '20 at 07:19
  • 1
    I think it is a bug and worth reporting. If you had both `x`, `y` and `X`, `Y`... then this works as expected ```ggplot(data = data.frame(X, Y), aes(x = X, y = Y)) + geom_point(col = 'red') + geom_smooth(method = 'lm', formula =y ~ x)``` so, as I said and you figured out, it's a problem with how `geom_smooth` handles `formula`. – M-- Jan 19 '20 at 07:25
  • 2
    Read here, it may help: https://stackoverflow.com/questions/27464893/getting-warning-newdata-had-1-row-but-variables-found-have-32-rows-on-pred – M-- Jan 19 '20 at 07:35
  • Not that same issue, although quite relevant. Thanks! – shubh Jan 19 '20 at 07:40
  • 1
    `geom_smooth` inherits the `aes()` setting, so it *always* maps `x` and `y` to those settings. It's not a bug, it's a design decision. – Rui Barradas Jan 19 '20 at 07:42
  • Yes, I also agree that it is not a bug. We should understand the discussion in the link that @M-- referred. In fact, instead of `x` and `y` from `aes()`, when we are using the vector/variable/column names themselves inside the `geom_smooth()` `formula`, this thing is happening. – shubh Jan 19 '20 at 08:21
  • although the issue is interesting, you can just leave the `formula` argument out altogether, because y~x is the default for `method = lm`. then there should be no problem at all. – tjebo Jan 19 '20 at 15:27
  • @RuiBarradas-ReinstateMonic I used `inherit.aes = F` and provided the data; yet it didn't work. I am missing something, probably simple. Care to post an answer? – M-- Jan 19 '20 at 17:11
  • 1
    NVM, got it. `formula` maps `aes` argument not the variables. – M-- Jan 19 '20 at 17:13
  • Can you explain that `Warning message` with the first graph we are getting? That can help us to understand this issue. – shubh Jan 19 '20 at 17:52

0 Answers0