Use list of models to calculate new values in a data.frame

Question

Using dlply (from this post; code below) I am able to generate a list of linear models on subsets of my data.frame. Now that I have this list, I would like to use the models to generate values in another data.frame.

The list contains a model for each DAY and variable subset. I would like to apply the model to the same subsets in another data.frame. For example, for DAY == 1 and variable == Var.1 the model (y = mx+b) is value = -4.521869(Location) + 21.315. Using the model for the appropriate subsets, I would calculate values for Var.1 in another data.frame (e.g. dat_rec which already has entries for DAY and Location).

Is there a way to use the models from the list on the same subsets in another data.frame (e.g. use the model to for DAY == 1 and variable == Var.1 to populate values in the data.frame everywhere[e.g. different Sites] DAY == 1 and variable == Var.1) Is there a similar list method to populate a data.frame with the values calculated using the models from the list? The desired end product (i.e. dat_rec below) is data.frame.

# Data
dat <- structure(list(Site = c(32L, 32L, 32L, 32L, 10L, 10L, 10L, 10L, 
32L, 32L, 32L, 32L, 10L, 10L, 10L, 10L), Location = c(0L, 0L, 
0L, 0L, 0L, 0L, 0L, 0L, 1L, 1L, 1L, 1L, 1L, 1L, 1L, 1L), DAY = c(5L, 
55L, 555L, 5555L, 5L, 55L, 555L, 5555L, 5L, 55L, 555L, 5555L, 
5L, 55L, 555L, 5555L), Var.1 = c(20.9, 20.8, 21.03, 21.36, 21.73, 
21.18, 20.73, 21.98, 21.73, 12.48702448, 12.19642662, 12.33218874, 
11.85626285, 11.88812108, 12.70549981, 11.89587521), Var.2 = c(100L, 
100L, 100L, 100L, 100L, 100L, 100L, 100L, 90L, 90L, 90L, 91L, 
92L, 88L, 89L, 90L), Var.3 = c(14.47, 14.4, 14.3, 14.14, 14.72, 
14.62, 14.14, 14.49, 10.27287765, 10.27287765, 10.41763527, 10.51725376, 
11.12918753, 10.81166867, 10.80656509, 11.00093898), Var.4 = c(890.19, 
888.9, 889.14, 888.15, 889.57, 888.41, 887.48, 886.87, 688.15, 
698.23, 650.99, 700.01, 699, 689.6, 658.7, 689.99)), .Names = c("Site", 
"Location", "DAY", "Var.1", "Var.2", "Var.3", "Var.4"), class = "data.frame", row.names = c(NA, 
-16L))

# melt data for use with dlply
mdat <- melt(dat, id=c("DAY", "Site", "Location"))

# this dlply solution was built from here https://stackoverflow.com/a/1214432/1670053
models_mdat <- dlply(mdat, c("DAY","variable"), function(df) 
                lm(value ~ Location, data = df))

# example (partial) result, with Var.1 filled in for two DAYs
# I've only filled in the values for Var.1 using the model from the list 
# for DAY 5 and 55.
# not melted
dat_rec <- structure(list(Site = c(1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L), Location = c(0.1, 
0.2, 0.3, 0.4, 0.1, 0.2, 0.3, 0.4), DAY = c(5L, 5L, 5L, 5L, 55L, 
55L, 55L, 55L), Var.1 = c(20.8628131, 20.4106262, 19.9584393, 
19.5062524, 20.1097573, 19.2295146, 18.3492719, 17.4690292), 
    Var.2 = c(NA, NA, NA, NA, NA, NA, NA, NA), Var.3 = c(NA, 
    NA, NA, NA, NA, NA, NA, NA), Var.4 = c(NA, NA, NA, NA, NA, 
    NA, NA, NA)), .Names = c("Site", "Location", "DAY", "Var.1", 
"Var.2", "Var.3", "Var.4"), class = "data.frame", row.names = c(NA, 
-8L))
# melted
    dat_rec_melt <- structure(list(Site = c(1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 1L, 1L, 
1L, 1L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 1L, 1L, 
1L, 1L, 3L, 3L, 3L, 3L), Location = c(0.1, 0.2, 0.3, 0.4, 0.1, 
0.2, 0.3, 0.4, 0.1, 0.2, 0.3, 0.4, 0.1, 0.2, 0.3, 0.4, 0.1, 0.2, 
0.3, 0.4, 0.1, 0.2, 0.3, 0.4, 0.1, 0.2, 0.3, 0.4, 0.1, 0.2, 0.3, 
0.4), DAY = c(5L, 5L, 5L, 5L, 55L, 55L, 55L, 55L, 5L, 5L, 5L, 
5L, 55L, 55L, 55L, 55L, 5L, 5L, 5L, 5L, 55L, 55L, 55L, 55L, 5L, 
5L, 5L, 5L, 55L, 55L, 55L, 55L), variable = structure(c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("Var.1", 
"Var.2", "Var.3", "Var.4"), class = "factor"), value = c(20.8628131, 
20.4106262, 19.9584393, 19.5062524, 20.1097573, 19.2295146, 18.3492719, 
17.4690292, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("Site", 
"Location", "DAY", "variable", "value"), row.names = c(NA, -32L
), class = "data.frame")

agstudy · Accepted Answer · 2013-12-16T21:42:18.197

2

I think you are looking for predict:

sapply(models_mdat ,predict,newdata=dat_rec)

EDIT get the result aligned with new datas:

lapply(models_mdat ,function(x)
       cbind(dat_rec,fit=predict(x,newdata=dat_rec)))

edited Dec 16 '13 at 21:42

answered Dec 16 '13 at 21:12

agstudy

119,832
17
199
261

`predict` seems like it is what I want. However, is it possible to use one of the `apply` functions with losing the rest of the info in `data_rec`? Using `sapply` as above results in solution that does not keep the info from `data_rec`. How do I get/keep the info from `sapply` to be aligned with `data_rec`? – nofunsally Dec 16 '13 at 21:32
Thanks. This is closer to the result I was looking for above. I will continue to read about the different `apply` procedures to see if I can arrive at a solution that approximates `data_rec` above. A result where the list of models is used to build a data.frame (e.g. data_rec), or to build one list. I am not trying to build a separate list/data.frame for each model in the model list, but rather use each model to build upon one. – nofunsally Dec 16 '13 at 21:58
I am not sure to understand what you are looking for, but maybe you can use the combine the result of `res <- lapply(models_mdat ,..)` using `do.call(cbind,res)` – agstudy Dec 16 '13 at 22:07
I will explore the suggestions in your latest comment. In an effort to be clearer... my desired end product is a data.frame that has been updated with values calculated from models contained in the list of the models. The list of models was created using subsets and these same subsets should be used when updating the data.frame, e.g. the model for `DAY`== 1 & `variable` == Var.1 should be used to update the data.frame with values were its `DAY` == 1 & `variable` == Var.1, etc – nofunsally Dec 17 '13 at 04:13

score 0 · Answer 2 · edited May 23 '17 at 11:45

Using the information from agstudy it appears that predict is the tool I was looking for to calculate the values from the models. Knowing that I wanted to use the model list generated to with dlply to update a data.frame with predictions I had a much better idea on what to search for to find a solution.

I found a solution in this post. To acheive the result I was looking for I need to use the model list and also the data as a list. Then predict can be used with mdply to finally arrive at an updated data.frame.

# melted
    dat_rec_melt <- structure(list(Site = c(1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 1L, 1L, 
1L, 1L, 3L, 3L, 3L, 3L, 1L, 1L, 1L, 1L, 3L, 3L, 3L, 3L, 1L, 1L, 
1L, 1L, 3L, 3L, 3L, 3L), Location = c(0.1, 0.2, 0.3, 0.4, 0.1, 
0.2, 0.3, 0.4, 0.1, 0.2, 0.3, 0.4, 0.1, 0.2, 0.3, 0.4, 0.1, 0.2, 
0.3, 0.4, 0.1, 0.2, 0.3, 0.4, 0.1, 0.2, 0.3, 0.4, 0.1, 0.2, 0.3, 
0.4), DAY = c(5L, 5L, 5L, 5L, 55L, 55L, 55L, 55L, 5L, 5L, 5L, 
5L, 55L, 55L, 55L, 55L, 5L, 5L, 5L, 5L, 55L, 55L, 55L, 55L, 5L, 
5L, 5L, 5L, 55L, 55L, 55L, 55L), variable = structure(c(1L, 1L, 
1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 3L, 3L, 
3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 4L), .Label = c("Var.1", 
"Var.2", "Var.3", "Var.4"), class = "factor"), value = c(20.8628131, 
20.4106262, 19.9584393, 19.5062524, 20.1097573, 19.2295146, 18.3492719, 
17.4690292, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, 
NA, NA, NA, NA, NA, NA, NA, NA, NA, NA, NA)), .Names = c("Site", 
"Location", "DAY", "variable", "value"), row.names = c(NA, -32L
), class = "data.frame")

dat_rec_list <- dlply(dat_rec_melt, c("DAY", "variable"))

predictions <- mdply(cbind(mod = models_mdat, df = dat_rec_list), function(mod, df) {
  mutate(df, pred = predict(mod, newdata = df))
})

Use list of models to calculate new values in a data.frame

2 Answers2