0

I have 18 dates (e.g. unique DAY, MONTH, YEAR) and 10 variables. I have a lm model (y = mx + b, where y = value and x = pLength) for each date and variable (180 models). These are stored in a list (i.e. models).

I want to use these models to predict values. I have data.frame (the data.frame to recieve the values) with the following columns, DAY, MONTH, YEAR and pLength, here I want to predict the value for each variable for each date/pLength combination.

For example, if the model was for date and variable combination = 7.8.2013.Var1, there should be a prediction for Var1 for every pLength that occurs on 7.8.2013.

To this end, I attempted to use another list created from the receiving data.frame. This list (i.e. rec_List) is 152 splits of the receiving data.frame. These splits are unique rows of DAY, MONTH, YEAR and pLength. They are same 18 dates as above grouped with different pLengths. The value and number of pLengths vary by date.

In my current approach I attempted to use some information from another post attempting to use lists with predict (but for only one variable). This is not working for me. Instead of getting a prediction for each variable for each pLength by date, I end up with some haphazard predictions totaling 180, the same as the number of models.

# Current Output  'preds'
X1            DAY MONTH YEAR pLength value
7.8.2013.Var1 7   8     2013 0.00    0.00
7.8.2013.Var2 7   8     2013 0.25    1.07  
7.8.2013.Var3 7   8     2013 0.33    6.25
etc 

# Desired Output
X1             DAY MONTH YEAR pLength value
7.8.2013.Var1  7   8     2013 0.00    0.00
7.8.2013.Var2  7   8     2013 0.00    1.10
7.8.2013.Var3  7   8     2013 0.00    6.55
...
7.8.2013.Var10 7   8     2013 0.00    100.10
7.9.2013.Var1  7   9     2013 0.25    0.00
7.9.2013.Var2  7   9     2013 0.25    1.15 
etc

When saw this current output above I thought that I could perhaps duplicate the rows in recieiving data.frame list so that each DAY, MONTH, YEAR and pLength was replicated 10 times. This did not work, but resulted in this.

# with duplicated rows
    X1              DAY     MONTH   YEAR    pLength         value
1   7.8.2013.Var1   7   8   2013    0.0000000   0.000000e+00
2   7.8.2013.Var1   7   8   2013    0.0000000   0.000000e+00
3   7.8.2013.Var1   7   8   2013    0.0000000   0.000000e+00
4   7.8.2013.Var1   7   8   2013    0.0000000   0.000000e+00
5   7.8.2013.Var1   7   8   2013    0.0000000   0.000000e+00
6   7.8.2013.Var1   7   8   2013    0.0000000   0.000000e+00
7   7.8.2013.Var1   7   8   2013    0.0000000   0.000000e+00
8   7.8.2013.Var1   7   8   2013    0.0000000   0.000000e+00
9   7.8.2013.Var1   7   8   2013    0.0000000   0.000000e+00
10  7.8.2013.Var1   7   8   2013    0.0000000   0.000000e+00
11  7.8.2013.Var2   7   8   2013    0.2500000   1.072500e+00

where I was hoping rows 1 - 10 would be Var1-10 for 7.8.2013 and pLength = 0.00.

I know the problem may lie in the fact that I am cbinding unequal lists to create the predictions, but am unsure on how else to use a list of models with predict. I thought the row duplication would have helped with that.

In brief: I would like have a prediction for every date and pLength combination for each of 10 variables. I attempting to do this with lists, because this is the only way I currently can think of doing this after asking and reading other posts.

# code with abbreviated data
require(plyr)
require(reshape2)
mdata2 <- structure(list(DAY = c(7L, 8L, 7L, 8L, 7L, 8L, 7L, 8L, 7L, 8L, 
7L, 8L, 7L, 8L, 7L, 8L, 7L, 8L, 7L, 8L, 7L, 8L, 7L, 8L, 7L, 8L, 
7L, 8L, 7L, 8L, 7L, 8L, 7L, 8L, 7L, 8L, 7L, 8L, 7L, 8L), MONTH = c(8L, 
6L, 8L, 6L, 8L, 6L, 8L, 6L, 8L, 6L, 8L, 6L, 8L, 6L, 8L, 6L, 8L, 
6L, 8L, 6L, 8L, 6L, 8L, 6L, 8L, 6L, 8L, 6L, 8L, 6L, 8L, 6L, 8L, 
6L, 8L, 6L, 8L, 6L, 8L, 6L), YEAR = c(2013L, 2012L, 2013L, 2012L, 
2013L, 2012L, 2013L, 2012L, 2013L, 2012L, 2013L, 2012L, 2013L, 
2012L, 2013L, 2012L, 2013L, 2012L, 2013L, 2012L, 2013L, 2012L, 
2013L, 2012L, 2013L, 2012L, 2013L, 2012L, 2013L, 2012L, 2013L, 
2012L, 2013L, 2012L, 2013L, 2012L, 2013L, 2012L, 2013L, 2012L
), pLength = c(1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 
1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 
1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L, 1L, 1L, 0L, 0L), variable = structure(c(1L, 
1L, 1L, 1L, 2L, 2L, 2L, 2L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 5L, 
5L, 5L, 5L, 6L, 6L, 6L, 6L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 9L, 
9L, 9L, 9L, 10L, 10L, 10L, 10L), .Label = c("Rain", "Wind", "WindD", 
"TempA", "TempF", "RH", "FuelM", "WindMax", "PAR", "VPD"), class = "factor"), 
    value = c(0, 0, 0, 0, 0.51, 1.096, 1.26, 1.472, 67.59440741, 
    0.153388889, 67.59440741, 0.153388889, 30.17, 31.73, 31.06, 
    31.78, 33.52, 46.9, 40.06, 43.66, 55.62, 27.81, 50.75, 27.82, 
    13.33, 0.842, 10.39, 5.783, 0.727, 1.58, 2.247, 2.234, 1105, 
    1740, 1767, 1969, 1.90257357, 3.351394626, 2.17506063, 3.373580125
    )), .Names = c("DAY", "MONTH", "YEAR", "pLength", "variable", 
"value"), row.names = c(1L, 2L, 19L, 20L, 37L, 38L, 55L, 56L, 
73L, 74L, 91L, 92L, 109L, 110L, 127L, 128L, 145L, 146L, 163L, 
164L, 181L, 182L, 199L, 200L, 217L, 218L, 235L, 236L, 253L, 254L, 
271L, 272L, 289L, 290L, 307L, 308L, 325L, 326L, 343L, 344L), class = "data.frame")

vs2 <- structure(list(DAY = c(8L, 8L, 8L, 8L, 8L, 8L, 7L, 7L, 7L, 7L, 
7L, 7L, 7L, 7L, 7L), MONTH = c(6L, 6L, 6L, 6L, 6L, 6L, 8L, 8L, 
8L, 8L, 8L, 8L, 8L, 8L, 8L), YEAR = c(2012L, 2012L, 2012L, 2012L, 
2012L, 2012L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 2013L, 
2013L, 2013L), pLength = c(0, 0.222222222, 0.444444444, 0.666666667, 
0.888888889, 1, 0, 0, 0.25, 0.333333333, 0.5, 0.75, 0.666666667, 
1, 1)), .Names = c("DAY", "MONTH", "YEAR", "pLength"), row.names = c("1:89", 
"1:90", "1:91", "1:92", "1:93", "1:94", "2:6", "2:23", "2:31", 
"2:39", "2:49", "2:69", "2:71", "2:87", "2:96"), class = "data.frame")

# ** code edited to reflect answer below **
models <- dlply(mdata2, c("variable", "DAY", "MONTH", "YEAR"), function(df) 
  lm(value ~ pLength, data = df))

rec_List <- dlply(unique(vs2), c("DAY", "MONTH", "YEAR"))

preds <- mdply(cbind(mod = models, df = rec_List), function(mod, df) {
  mutate(df, value = predict(mod, newdata = df))
})
Community
  • 1
  • 1
nofunsally
  • 2,051
  • 6
  • 35
  • 53

1 Answers1

0

The problem was that when the two lists were cbinded that didn't "match", their orders were different. The model list was created by Day, Month, Year, variable whereas the rec_List went Day, Month, Year. In this way the rec_List would cycle through dates and would models creating the mistmatch decribed above.

By changing this line:

models <- dlply(mdata2, c("DAY", "MONTH", "YEAR", "variable"), function(df) 
  lm(value ~ pLength, data = df))

to this, with variable first

models <- dlply(mdata2, c("variable", "DAY", "MONTH", "YEAR"), function(df) 
  lm(value ~ pLength, data = df))

Results in the desired output. That is there is prediction for each variable for each date/pLength combination.

nofunsally
  • 2,051
  • 6
  • 35
  • 53