1

My question is similar to this one, but now I am trying to use a model with multiple predictors and I can't figure out how to get the newdata into the predict function.

library(dplyr)
library(lubridate)
library(purrr)
library(tidyr)
library(broom)

set.seed(1234)

First I create a seq of weeks

wks = seq(as.Date("2010-01-01"), Sys.Date(), by="1 week")

Then I grab the current year

cur_year <- year(Sys.Date())

Here I create a data frame with dummy data

my_data <- data.frame(
  week_ending = wks
) %>% 
  mutate(
    ref_period = week(week_ending),
    yr = year(week_ending),
    PCT.EXCELLENT = round(runif(length(wks), 0, 100),0),
    PCT.GOOD = round(runif(length(wks), 0, 100),0),
    PCT.FAIR = round(runif(length(wks), 0, 100),0),
    PCT.POOR = round(runif(length(wks), 0, 100),0),
    PCT.VERY.POOR = round(runif(length(wks), 0, 100),0),
    pct_trend = round(runif(length(wks), 75, 125),0)
  )

Next I create a nested dataframe that has the data for each week of the year as one group.

cond_model <- my_data %>% 
  filter(yr != cur_year) %>% 
  group_by(ref_period) %>% 
  nest(.key=cond_data) 

Here I join this year's data back into the previous years' data by week of the year.

cond_model <- left_join(
  cond_model,
  my_data %>% 
    filter(yr==cur_year) %>% 
    select(week_ending,
           ref_period,
           PCT.EXCELLENT,
           PCT.FAIR,
           PCT.GOOD,
           PCT.POOR,
           PCT.VERY.POOR),
  by = c("ref_period")
) 

And this adds the linear model to the data frame for each week of the year

cond_model <- 
  cond_model %>% 
  mutate(model = map(cond_data,
                     ~lm(pct_trend ~ PCT.EXCELLENT + PCT.GOOD + PCT.FAIR + PCT.POOR + PCT.VERY.POOR, data = .x)))

now I would like to use the model for each week to predict using this year's data. I tried the following:

cond_model <- 
  cond_model %>% 
  mutate(
    pred_pct_trend = map2_dbl(model, PCT.EXCELLENT + PCT.GOOD + PCT.FAIR + PCT.POOR + PCT.VERY.POOR,
                              ~predict(.x, newdata = data.frame(.y)))
  )

That gives the following error:

Error in mutate_impl(.data, dots) : object 'PCT.EXCELLENT' not found

I then tried nesting my predictors in my data frame...

create a data frame with just this year's data and nest the predictors

cur_cond <- my_data %>% 
  filter(yr==cur_year) %>% 
  select(week_ending, PCT.EXCELLENT,
         PCT.GOOD, PCT.FAIR, PCT.POOR, PCT.VERY.POOR) %>% 
  group_by(week_ending) %>% 
  nest(.key=new_data) %>% 
  mutate(new_data=map(new_data, ~data.frame(.x)))

join this into my main data frame

cond_model <- left_join(cond_model, cur_cond)

Now I try the prediction again:

cond_model <- 
  cond_model %>% 
  mutate(
    pred_pct_trend = map2_dbl(model, new_data,
                              ~predict(.x, newdata = data.frame(.y)))
  )

I get the same error as before:

Error in mutate_impl(.data, dots) : object 'PCT.EXCELLENT' not found

I think that the answer could involve performing a flatten() on the predictors, but I can't figure out where that goes in my workflow.

cond_model$new_data[1]

vs.

flatten_df(cond_model$new_data[1])

and at this point I have run out of ideas.

jkgrain
  • 769
  • 5
  • 20
  • Weeks 31-52 don't have data from 2017. What do you want your predictions to look like for those weeks? If you get rid of those rows, your second method using nesting of the prediction dataset should work fine. – aosmith Jul 28 '17 at 19:00
  • weeks 31-52 should be NA because I don't have my predictors yet. In the earlier question that I referenced at the very top I ended up with the same situation in my real workflow and the predictions for the weeks without predictors just returned NA. – jkgrain Jul 28 '17 at 19:03
  • I tried filtering out rows 31-52 and I still have the same error. – jkgrain Jul 28 '17 at 19:09

1 Answers1

2

Once you get your prediction dataset added in, the main issue is how to deal with the weeks that don't have prediction data (weeks 31-53).

You'll see when you join the two datasets, the rows without prediction dataset will be filled with NULL. You can use an ifelse statement to give predictions of NA for these rows.

# Modeling data
cond_model = my_data %>%
    filter(yr != cur_year) %>%
    group_by(ref_period) %>%
    nest(.key = cond_data)

# Create prediction data
cur_cond = my_data %>%
    filter(yr == cur_year) %>% 
    group_by(ref_period) %>% 
    nest( .key = new_data )

# Join these together
cond_model = left_join(cond_model, cur_cond)

# Models
cond_model = cond_model %>% 
    mutate(model = map(cond_data,
                       ~lm(pct_trend ~ PCT.EXCELLENT + PCT.GOOD + 
                               PCT.FAIR + PCT.POOR + PCT.VERY.POOR, data = .x) ) )

Put an ifelse in to return NA when there is no prediction data.

# Predictions
cond_model %>% 
    mutate(pred_pct_trend = map2_dbl(model, new_data,
                                     ~ifelse(is.null(.y), NA, 
                                             predict(.x, newdata = .y) ) ) )

# A tibble: 53 x 5
   ref_period        cond_data         new_data    model pred_pct_trend
        <dbl>           <list>           <list>   <list>          <dbl>
 1          1 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>       83.08899
 2          2 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>      114.39089
 3          3 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>      215.02055
 4          4 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>      130.24556
 5          5 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>      112.86516
 6          6 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>      107.29866
 7          7 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>       52.11526
 8          8 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>      106.22482
 9          9 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>      128.40858
10         10 <tibble [7 x 8]> <tibble [1 x 8]> <S3: lm>      108.10306
aosmith
  • 34,856
  • 9
  • 84
  • 118