using predict with a list of lm() objects

Question

I have data which I regularly run regressions on. Each "chunk" of data gets fit a different regression. Each state, for example, might have a different function that explains the dependent value. This seems like a typical "split-apply-combine" type of problem so I'm using the plyr package. I can easily create a list of lm() objects which works well. However I can't quite wrap my head around how I use those objects later to predict values in a separate data.frame.

Here's a totally contrived example illustrating what I'm trying to do:

# setting up some fake data
set.seed(1)
funct <- function(myState, myYear){
   rnorm(1, 100, 500) +  myState + (100 * myYear) 
}
state <- 50:60
year <- 10:40
myData <- expand.grid( year, state)
names(myData) <- c("year","state")
myData$value <- apply(myData, 1, function(x) funct(x[2], x[1]))
## ok, done with the fake data generation. 

require(plyr)

modelList <- dlply(myData, "state", function(x) lm(value ~ year, data=x))
## if you want to see the summaries of the lm() do this:  
    # lapply(modelList, summary)

state <- 50:60
year <- 50:60
newData <- expand.grid( year, state)
names(newData) <- c("year","state") 
## now how do I predict the values for newData$value 
   # using the regressions in modelList?

So how do I use the lm() objects contained in modelList to predict values using the year and state independent values from newData?

score 9 · Accepted Answer · edited Dec 13 '11 at 23:54

9

Here's my attempt:

predNaughty <- ddply(newData, "state", transform,
  value=predict(modelList[[paste(piece$state[1])]], newdata=piece))
head(predNaughty)
#   year state    value
# 1   50    50 5176.326
# 2   51    50 5274.907
# 3   52    50 5373.487
# 4   53    50 5472.068
# 5   54    50 5570.649
# 6   55    50 5669.229
predDiggsApproved <- ddply(newData, "state", function(x)
  transform(x, value=predict(modelList[[paste(x$state[1])]], newdata=x)))
head(predDiggsApproved)
#   year state    value
# 1   50    50 5176.326
# 2   51    50 5274.907
# 3   52    50 5373.487
# 4   53    50 5472.068
# 5   54    50 5570.649
# 6   55    50 5669.229

JD Long edit

I was inspired enough to work out an adply() option:

pred3 <- adply(newData, 1,  function(x)
    predict(modelList[[paste(x$state)]], newdata=x))
head(pred3)
#   year state        1
# 1   50    50 5176.326
# 2   51    50 5274.907
# 3   52    50 5373.487
# 4   53    50 5472.068
# 5   54    50 5570.649
# 6   55    50 5669.229

edited Dec 13 '11 at 23:54

JD Long

59,675
58
202
294

answered Dec 13 '11 at 23:10

Joshua Ulrich

173,410
32
338
418

that totally nails it! Thank you mucho. Can you explain where the data.frame `piece` comes from? Is it autogenerated by ddply? – JD Long Dec 13 '11 at 23:21
@JDLong: `.fun` is ultimately called on a data frame named `piece`. But, as @BrianDiggs pointed out in chat, this shouldn't be relied upon. Better to wrap in an anonymous function (see my update). – Joshua Ulrich Dec 13 '11 at 23:30
hi, if you could have a look at my question it would be great http://stackoverflow.com/questions/43427392/apply-predict-between-data-frames-within-two-lists . thanks! – aaaaa Apr 15 '17 at 15:05
@JDLong Am I able to get standard errors using this approach? – juliamm2011 Oct 17 '19 at 18:02
@juliamm2011 I think all you have to do is turn `se.fit=TRUE` per this question: https://stackoverflow.com/a/33660779/37751 note that we're now 8 years after this question was answered and I would no longer be using `adply` any longer but rather would probably use `broom` – JD Long Oct 18 '19 at 02:51

score 8 · Answer 2 · answered Dec 16 '11 at 12:56

8

You need to use mdply to supply both the model and the data to each function call:

dataList <- dlply(newData, "state")

preds <- mdply(cbind(mod = modelList, df = dataList), function(mod, df) {
  mutate(df, pred = predict(mod, newdata = df))
})

answered Dec 16 '11 at 12:56

hadley

102,019
32
183
245

score 6 · Answer 3 · answered Dec 14 '11 at 04:43

6

A solution with just base R. The format of the output is different, but all the values are right there.

models <- lapply(split(myData, myData$state), 'lm', formula = value ~ year)
pred4  <- mapply('predict', models, split(newData, newData$state))

answered Dec 14 '11 at 04:43

Ramnath

54,439
16
125
152

thanks @ramnath. I really like comparing base R solutions to those done with packages. It helps me both improve my base R understanding as well as understand the compromises I am making when using abstractions like plyr. – JD Long Dec 14 '11 at 04:48
And this is how I normally solve the problem - but with `dlply` and `mdply` – hadley Dec 14 '11 at 12:30
@hadley Could you show a worked example for this case? I tried building one with `mdply` and could not figure out how to do it because `.data` has to be a matrix or data.frame, and the two arguments to `predict` are an `lm` object and a `data.frame`. I couldn't stuff a list of `lm` objects as a column in a `data.frame`. The other approach I tried, making `.data` a list of lists, (`.data=list(object=modelList, newData=newDataList)` where `newDataList <- dlply(newData, .(state), identity)`) didn't work because `.data` wasn't a matrix or data.frame (as per the documentation). – Brian Diggs Dec 14 '11 at 19:04
In brief, cbind the two lists together – hadley Dec 16 '11 at 03:12

Brian Diggs · Answer 4 · 2011-12-13T23:15:05.260

What is wrong with

lapply(modelList, predict, newData)

?

EDIT:

Thanks for explaining what is wrong with that. How about:

newData <- data.frame(year)
ldply(modelList, function(model) {
  data.frame(newData, predict=predict(model, newData))
})

Iterate over the models, and apply the new data (which is the same for each state since you just did an expand.grid to create it).

EDIT 2:

If newData does not have the same values for year for every state as in the example, a more general approach can be used. Note that this uses the original definition of newData, not the one in the first edit.

ldply(state, function(s) {
  nd <- newData[newData$state==s,]
  data.frame(nd, predict=predict(modelList[[as.character(s)]], nd))
})

First 15 lines of this output:

   year state  predict
1    50    50 5176.326
2    51    50 5274.907
3    52    50 5373.487
4    53    50 5472.068
5    54    50 5570.649
6    55    50 5669.229
7    56    50 5767.810
8    57    50 5866.390
9    58    50 5964.971
10   59    50 6063.551
11   60    50 6162.132
12   50    51 5514.825
13   51    51 5626.160
14   52    51 5737.496
15   53    51 5848.832

that's exactly the sort of thing I keep cooking up, but it's not really what I'm after. That applies every model to every state. I only want the model where state==50 to be applied to the data where state==50 — JD Long, Dec 13 '11 at 22:49

Tommy · Answer 5 · 2011-12-13T23:11:41.987

I take it the hard part is matching each state in newData to the corresponding model.

Something like this perhaps?

predList <- dlply(newData, "state", function(x) {
  predict(modelList[[as.character(min(x$state))]], x) 
})

Here I used a "hacky" way of extracting the corresponding state model: as.character(min(x$state))

...There is probably a better way?

Output:

> predList[1:2]
$`50`
       1        2        3        4        5        6        7        8        9       10       11 
5176.326 5274.907 5373.487 5472.068 5570.649 5669.229 5767.810 5866.390 5964.971 6063.551 6162.132 

$`51`
      12       13       14       15       16       17       18       19       20       21       22 
5514.825 5626.160 5737.496 5848.832 5960.167 6071.503 6182.838 6294.174 6405.510 6516.845 6628.181

Or, if you want a data.frame as output:

predData <- ddply(newData, "state", function(x) {
  y <-predict(modelList[[as.character(min(x$state))]], x)
  data.frame(id=names(y), value=c(y))
})

Output:

head(predData)
  state id    value
1    50  1 5176.326
2    50  2 5274.907
3    50  3 5373.487
4    50  4 5472.068
5    50  5 5570.649
6    50  6 5669.229

score 1 · Answer 6 · edited Jul 23 '14 at 12:59

1

Maybe I'm missing something, but I believe lmList is the ideal tool here,

library(nlme)
ll = lmList(value ~ year | state, data=myData)
predict(ll, newData)


## Or, to show that it produces the same results as the other proposed methods...
newData[["value"]] <- predict(ll, newData)
head(newData)
#   year state    value
# 1   50    50 5176.326
# 2   51    50 5274.907
# 3   52    50 5373.487
# 4   53    50 5472.068
# 5   54    50 5570.649
# 6   55    50 5669.229

edited Jul 23 '14 at 12:59

Josh O'Brien

159,210
26
366
455

answered Jul 23 '14 at 09:53

baptiste

75,767
19
198
294

Uh, yeah, that does seem best! Really nice that `lmList` has its own `predict()` method. – Josh O'Brien Jul 23 '14 at 12:55

using predict with a list of lm() objects

6 Answers6

Linked