I have many linear models stored in one table. Now I would like to use the model in reach row to predict one single y value, given the one single x value in the respective row.
The difficulty is caused by the way in which both data.table and the tidyverse extract the models in the table. The predict.lm function requires the class "lm" object, inside the class "list" object.
library(data.table)
model1 <- lm( y~x, data= data.table( x=c(1,2,3,4) , y=c(1,2,1,2) ))
model2 <- lm( y~x, data= data.table( x=c(1,2,3,4) , y=c(1,2,3,3) ))
model_dt <- data.table( id = c(1,2), model = list(model1, model2), x = c(3,3))
Now the model_dt contains the linear models and the required x value.
Predicting line by line works well:
predict.lm( model_dt[1]$model[[1]], model_dt[1]) # yields 1.6
predict.lm( model_dt[2]$model[[1]], model_dt[2]) # yields 2.6
But adding a column directly results in an error:
model_dt[, pred_y := predict.lm( model , x )] # ERROR
model_dt[, pred_y := predict.lm( model , x ), by=id] # ERROR
================================================================
The same setup in the tidyverse:
library(tidyverse)
model1 <- lm( y~x, data= tibble( x=c(1,2,3,4) , y=c(1,2,1,2) ))
model2 <- lm( y~x, data= tibble( x=c(1,2,3,4) , y=c(1,2,3,3) ))
model_dt <- tibble( id = c(1,2), model = list(model1, model2), x = c(3,3))
predict.lm( model_dt[1,]$model[[1]], model_dt[1,]) # yields 1.6
predict.lm( model_dt[2,]$model[[1]], model_dt[2,]) # yields 2.6
And adding a variable with mutate fails:
model_dt %>% mutate( pred_y = predict.lm( model, x ) ) # ERROR
It seems one reason is, that the models inside the "model" column inside the tables can not be extracted as class "lm" object, but using model[[1]] inside the data.table or mutate function always refers to the model in row 1.
class( model_dt[1,]$model ) # results in class "list"
class( model_dt[1,]$model[[1]] ) # results in class "lm"
The result should be a table as follows:
id model x pred_y
1: 1 <lm> 3 1.6
2: 2 <lm> 3 2.6
I am sure there is a straightforward solution and would be very happy about the input. Also possible solutions with map() or lapply() had the same issues. Thank you very much.
=====================================================================
Edit: This question also asks for a solution in data.table in addition to question using lm in list column to predict new values using purrr