2

I'm trying to display a plane of best fit within a 3D scatter plot using the library plot3D. When the code below is run everything seems fine enough, but if I replace the fit with the second fit I get strange behaviour, the plane is no longer a flat plane. I expect both versions to produce the same picture. What's going on?

enter image description here

library(plot3D)

df <- structure(list(X = 1:10, TV = c(230.1, 44.5, 17.2, 151.5, 180.8, 
8.7, 57.5, 120.2, 8.6, 199.8), radio = c(37.8, 39.3, 45.9, 41.3, 
10.8, 48.9, 32.8, 19.6, 2.1, 2.6), newspaper = c(69.2, 45.1, 
69.3, 58.5, 58.4, 75, 23.5, 11.6, 1, 21.2), sales = c(22.1, 10.4, 
9.3, 18.5, 12.9, 7.2, 11.8, 13.2, 4.8, 10.6)), .Names = c("X", 
"TV", "radio", "newspaper", "sales"), row.names = c(NA, 10L), class = "data.frame")


x<-df$TV
y<-df$radio
z<-df$sales

fit <- lm(z ~ x + y)
# fit <- lm(df$sales ~ df$TV + df$radio)

x.pred <- seq(min(x), max(x), length.out = 5)
y.pred <- seq(min(y), max(y), length.out = 5)
xy <- expand.grid( x = x.pred, y = y.pred)

z.pred <- matrix(predict(fit, newdata = xy), nrow = 5, ncol = 5)

scatter3D(x, y, z,
    surf = list(x = x.pred, y = y.pred, z = z.pred)
    )
Werner Hertzog
  • 2,002
  • 3
  • 24
  • 36
Geoff
  • 925
  • 4
  • 14
  • 36

1 Answers1

1

The short answer is: Both fits are correct. However the second predict is not finding the right column names to predict.


If you want the second fit to work use:

fit <- lm(sales ~ TV + radio, data=df)
...
xy <- expand.grid(TV = x.pred, radio = y.pred)

Why? Because predict always searches for the column name it was trained in newdata.

You may note that the first line in the code above also changed, we are no longer using the df$var format, instead we use the data argument. This happens because when using this format fit$model is equals to:

   df$sales df$TV df$radio
1      22.1 230.1     37.8
2      10.4  44.5     39.3
3       9.3  17.2     45.9
...

And we can't name column names with "$" dollar sign. In other words we can't do:

fit <- lm(df$sales ~ df$TV + df$radio)
...
xy <- expand.grid(df$TV = x.pred, df$radio = y.pred)

Because it will throw an error.


As stated above, both fits are, indeed, correct. If you run,

fit <- lm(z ~ x + y)
fit

you will get,

Coefficients: (Intercept) x y
2.08052 0.05598 0.15282

and with,

fit <- lm(df$sales ~ df$TV + df$radio)
fit

you will get,

Coefficients: (Intercept) x y
2.08052 0.05598 0.15282

as well.


Finally, note that when predict with newdata can't find the right variables names, you will get a warning message like this:

'newdata' had 25 rows but variables found have 10 rows

Which in my opinion should be an error. But it may get fixed in next versions. Some other sources about this issue are:

adelriosantiago
  • 7,762
  • 7
  • 38
  • 71