4

I have many linear models stored in one table. Now I would like to use the model in reach row to predict one single y value, given the one single x value in the respective row.

The difficulty is caused by the way in which both data.table and the tidyverse extract the models in the table. The predict.lm function requires the class "lm" object, inside the class "list" object.

library(data.table)

model1 <- lm( y~x, data= data.table( x=c(1,2,3,4) , y=c(1,2,1,2) ))
model2 <- lm( y~x, data= data.table( x=c(1,2,3,4) , y=c(1,2,3,3) ))

model_dt <- data.table( id = c(1,2), model = list(model1, model2), x = c(3,3))

Now the model_dt contains the linear models and the required x value.

Predicting line by line works well:

predict.lm( model_dt[1]$model[[1]], model_dt[1])  # yields 1.6
predict.lm( model_dt[2]$model[[1]], model_dt[2])  # yields 2.6

But adding a column directly results in an error:

model_dt[, pred_y := predict.lm( model , x )]         # ERROR
model_dt[, pred_y := predict.lm( model , x ), by=id]  # ERROR

================================================================

The same setup in the tidyverse:

library(tidyverse)

model1 <- lm( y~x, data= tibble( x=c(1,2,3,4) , y=c(1,2,1,2) ))
model2 <- lm( y~x, data= tibble( x=c(1,2,3,4) , y=c(1,2,3,3) ))

model_dt <- tibble( id = c(1,2), model = list(model1, model2), x = c(3,3))

predict.lm( model_dt[1,]$model[[1]], model_dt[1,])  # yields 1.6
predict.lm( model_dt[2,]$model[[1]], model_dt[2,])  # yields 2.6

And adding a variable with mutate fails:

model_dt %>% mutate( pred_y = predict.lm( model, x ) )  # ERROR

It seems one reason is, that the models inside the "model" column inside the tables can not be extracted as class "lm" object, but using model[[1]] inside the data.table or mutate function always refers to the model in row 1.

class( model_dt[1,]$model )      # results in class "list"
class( model_dt[1,]$model[[1]] ) # results in class "lm"

The result should be a table as follows:

   id model x pred_y
1:  1  <lm> 3    1.6
2:  2  <lm> 3    2.6

I am sure there is a straightforward solution and would be very happy about the input. Also possible solutions with map() or lapply() had the same issues. Thank you very much.

=====================================================================

Edit: This question also asks for a solution in data.table in addition to question using lm in list column to predict new values using purrr

1 Answers1

3

With tidyverse, we use map2 to loop through the 'model', corresponding 'x' values, pass the new data in predict as a data.frame or tibble

library(tidyverse)
model_dt %>% 
   mutate(pred_y = map2_dbl(model, x, ~ predict.lm(.x, tibble(x = .y))))
# A tibble: 2 x 4
#     id model      x pred_y
#   <dbl> <list> <dbl>  <dbl>
#1     1 <lm>       3   1.6 
#2     2 <lm>       3   2.60

Or with the data.table (object) with Map

model_dt[,  pred_y := unlist(Map(function(mod, y) 
          predict.lm(mod, data.frame(x = y)), model, x)), id][]
#   id model x pred_y
#1:  1  <lm> 3    1.6
#2:  2  <lm> 3    2.6
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Fantastic! This exactly solves the problem, also in the larger code not provided here. Thanks for the very fast answer and also big thanks for providing a solution in both data.table and the tidyverse!!! – user9938203 Jul 04 '19 at 07:39