I have 18 dates (e.g. unique DAY
, MONTH
, YEAR
) and 10 variables. I have a lm model (y = mx + b, where y = value
and x = pLength
) for each date and variable (180 models). These are stored in a list (i.e. models
).
I want to use these models to predict values. I have data.frame (the data.frame to recieve the values) with the following columns, DAY
, MONTH
, YEAR
and pLength
, here I want to predict the value for each variable for each date/pLength combination.
For example, if the model was for date and variable combination = 7.8.2013.Var1, there should be a prediction for Var1
for every pLength
that occurs on 7.8.2013.
To this end, I attempted to use another list created from the receiving data.frame. This list (i.e. rec_List
) is 152 splits of the receiving data.frame. These splits are unique rows of DAY
, MONTH
, YEAR
and pLength
. They are same 18 dates as above grouped with different pLength
s. The value and number of pLength
s vary by date.
In my current approach I attempted to use some information from another post attempting to use lists with predict (but for only one variable). This is not working for me. Instead of getting a prediction for each variable for each pLength by date, I end up with some haphazard predictions totaling 180, the same as the number of models.
# Current Output 'preds'
X1 DAY MONTH YEAR pLength value
7.8.2013.Var1 7 8 2013 0.00 0.00
7.8.2013.Var2 7 8 2013 0.25 1.07
7.8.2013.Var3 7 8 2013 0.33 6.25
etc
# Desired Output
X1 DAY MONTH YEAR pLength value
7.8.2013.Var1 7 8 2013 0.00 0.00
7.8.2013.Var2 7 8 2013 0.00 1.10
7.8.2013.Var3 7 8 2013 0.00 6.55
...
7.8.2013.Var10 7 8 2013 0.00 100.10
7.9.2013.Var1 7 9 2013 0.25 0.00
7.9.2013.Var2 7 9 2013 0.25 1.15
etc
When saw this current output above I thought that I could perhaps duplicate the rows in recieiving data.frame list so that each DAY
, MONTH
, YEAR
and pLength
was replicated 10 times. This did not work, but resulted in this.
# with duplicated rows
X1 DAY MONTH YEAR pLength value
1 7.8.2013.Var1 7 8 2013 0.0000000 0.000000e+00
2 7.8.2013.Var1 7 8 2013 0.0000000 0.000000e+00
3 7.8.2013.Var1 7 8 2013 0.0000000 0.000000e+00
4 7.8.2013.Var1 7 8 2013 0.0000000 0.000000e+00
5 7.8.2013.Var1 7 8 2013 0.0000000 0.000000e+00
6 7.8.2013.Var1 7 8 2013 0.0000000 0.000000e+00
7 7.8.2013.Var1 7 8 2013 0.0000000 0.000000e+00
8 7.8.2013.Var1 7 8 2013 0.0000000 0.000000e+00
9 7.8.2013.Var1 7 8 2013 0.0000000 0.000000e+00
10 7.8.2013.Var1 7 8 2013 0.0000000 0.000000e+00
11 7.8.2013.Var2 7 8 2013 0.2500000 1.072500e+00
where I was hoping rows 1 - 10 would be Var1-10 for 7.8.2013 and pLength = 0.00.
I know the problem may lie in the fact that I am cbinding unequal lists to create the predictions, but am unsure on how else to use a list of models with predict. I thought the row duplication would have helped with that.
In brief: I would like have a prediction for every date and pLength combination for each of 10 variables. I attempting to do this with lists, because this is the only way I currently can think of doing this after asking and reading other posts.
# code with abbreviated data
require(plyr)
require(reshape2)
mdata2 <- structure(list(DAY = c(7L, 8L, 7L, 8L, 7L, 8L, 7L, 8L, 7L, 8L,
7L, 8L, 7L, 8L, 7L, 8L, 7L, 8L, 7L, 8L, 7L, 8L, 7L, 8L, 7L, 8L,
7L, 8L, 7L, 8L, 7L, 8L, 7L, 8L, 7L, 8L, 7L, 8L, 7L, 8L), MONTH = c(8L,
6L, 8L, 6L, 8L, 6L, 8L, 6L, 8L, 6L, 8L, 6L, 8L, 6L, 8L, 6L, 8L,
6L, 8L, 6L, 8L, 6L, 8L, 6L, 8L, 6L, 8L, 6L, 8L, 6L, 8L, 6L, 8L,
6L, 8L, 6L, 8L, 6L, 8L, 6L), YEAR = c(2013L, 2012L, 2013L, 2012L,
2013L, 2012L, 2013L, 2012L, 2013L, 2012L, 2013L, 2012L, 2013L,
2012L, 2013L, 2012L, 2013L, 2012L, 2013L, 2012L, 2013L, 2012L,
2013L, 2012L, 2013L, 2012L, 2013L, 2012L, 2013L, 2012L, 2013L,
2012L, 2013L, 2012L, 2013L, 2012L, 2013L, 2012L, 2013L, 2012L
), pLength = c(1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L,
1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L,
1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L), variable = structure(c(1L,
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L,
5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L,
9L, 9L, 9L, 10L, 10L, 10L, 10L), .Label = c("Rain", "Wind", "WindD",
"TempA", "TempF", "RH", "FuelM", "WindMax", "PAR", "VPD"), class = "factor"),
value = c(0, 0, 0, 0, 0.51, 1.096, 1.26, 1.472, 67.59440741,
0.153388889, 67.59440741, 0.153388889, 30.17, 31.73, 31.06,
31.78, 33.52, 46.9, 40.06, 43.66, 55.62, 27.81, 50.75, 27.82,
13.33, 0.842, 10.39, 5.783, 0.727, 1.58, 2.247, 2.234, 1105,
1740, 1767, 1969, 1.90257357, 3.351394626, 2.17506063, 3.373580125
)), .Names = c("DAY", "MONTH", "YEAR", "pLength", "variable",
"value"), row.names = c(1L, 2L, 19L, 20L, 37L, 38L, 55L, 56L,
73L, 74L, 91L, 92L, 109L, 110L, 127L, 128L, 145L, 146L, 163L,
164L, 181L, 182L, 199L, 200L, 217L, 218L, 235L, 236L, 253L, 254L,
271L, 272L, 289L, 290L, 307L, 308L, 325L, 326L, 343L, 344L), class = "data.frame")
vs2 <- structure(list(DAY = c(8L, 8L, 8L, 8L, 8L, 8L, 7L, 7L, 7L, 7L,
7L, 7L, 7L, 7L, 7L), MONTH = c(6L, 6L, 6L, 6L, 6L, 6L, 8L, 8L,
8L, 8L, 8L, 8L, 8L, 8L, 8L), YEAR = c(2012L, 2012L, 2012L, 2012L,
2012L, 2012L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L,
2013L, 2013L), pLength = c(0, 0.222222222, 0.444444444, 0.666666667,
0.888888889, 1, 0, 0, 0.25, 0.333333333, 0.5, 0.75, 0.666666667,
1, 1)), .Names = c("DAY", "MONTH", "YEAR", "pLength"), row.names = c("1:89",
"1:90", "1:91", "1:92", "1:93", "1:94", "2:6", "2:23", "2:31",
"2:39", "2:49", "2:69", "2:71", "2:87", "2:96"), class = "data.frame")
# ** code edited to reflect answer below **
models <- dlply(mdata2, c("variable", "DAY", "MONTH", "YEAR"), function(df)
lm(value ~ pLength, data = df))
rec_List <- dlply(unique(vs2), c("DAY", "MONTH", "YEAR"))
preds <- mdply(cbind(mod = models, df = rec_List), function(mod, df) {
mutate(df, value = predict(mod, newdata = df))
})