In the past I've used the lm
function with matrix
-type data and data.frame
-type. But I guess this is the first time that I tried to use predict
using a model fitted without a data.frame
. And I'm can't figure out how to make it work.
I read some other questions (such as Getting Warning: " 'newdata' had 1 row but variables found have 32 rows" on predict.lm) and I'm pretty sure that my problem is related with the coefficient names I'm getting after fitting the model. For some reason the coefficients names are a paste of the matrix name with the column name... and I haven't been able to find how to fix that...
library(tidyverse)
library(MASS)
set.seed(1)
label <- sample(c(T,F), nrow(Boston), replace = T, prob = c(.6,.4))
x.train <- Boston %>% dplyr::filter(., label) %>%
dplyr::select(-medv) %>% as.matrix()
y.train <- Boston %>% dplyr::filter(., label) %>%
dplyr::select(medv) %>% as.matrix()
x.test <- Boston %>% dplyr::filter(., !label) %>%
dplyr::select(-medv) %>% as.matrix()
y.test <- Boston %>% dplyr::filter(., !label) %>%
dplyr::select(medv) %>% as.matrix()
fit_lm <- lm(y.train ~ x.train)
fit_lm2 <- lm(medv ~ ., data = Boston, subset = label)
predict(object = fit_lm, newdata = x.test %>% as.data.frame()) %>% length()
predict(object = fit_lm2, newdata = x.test %>% as.data.frame()) %>% length()
# they get different numbers of predicted data
# the first one gets a number a results consistent with x.train
Any help will be welcome.