0

I am using tidyverse,broom, and purrr to fit a model to some data, by group. I am then trying to use this model to predict on some new data, again by group. broom's 'augment' function nicely adds not only the predictions, but also other values like the std error, etc. However, I am unable to make the 'augment' function use the new data instead of the old data. As a result, my two sets of predictions are exactly the same. The question is - how can I make 'augment' use the new data instead of the old data (which was used to fit the model) ?

Here's a reproducible example:

library(tidyverse)
library(broom)
library(purrr)

# nest the iris dataset by Species and fit a linear model
iris.nest <- nest(iris, data = c(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)) %>% 
  mutate(model = map(data, function(df) lm(Sepal.Width ~ Sepal.Length, data=df)))

# create a new dataset where the Sepal.Length is 5x as big
newdata <- iris %>% 
  mutate(Sepal.Length = Sepal.Length*5) %>% 
  nest(data = c(Sepal.Length, Sepal.Width, Petal.Length, Petal.Width)) %>% 
  rename("newdata"="data")

# join these two nested datasets together
iris.nest.new <- left_join(iris.nest, newdata)

# now form two new columns of predictions -- one using the "old" data that the model was
# initially fit on, and the second using the new data where the Sepal.Length has been increased
iris.nest.new <- iris.nest.new %>% 
  mutate(preds = map(model, broom::augment),
         preds.new = map2(model, newdata, broom::augment))  # THIS LINE DOESN'T WORK ****
                             
# unnest the predictions on the "old" data
preds <-select(iris.nest.new, preds) %>% 
 unnest(cols = c(preds))
# rename the columns prior to merging
names(preds)[3:9] <- paste0("old", names(preds)[3:9])

# now unnest the predictions on the "new" data
preds.new <-select(iris.nest.new, preds.new) %>% 
 unnest(cols = c(preds.new))
#... and also rename columns prior to merging
names(preds.new)[3:9] <- paste0("new", names(preds.new)[3:9])

# merge the two sets of predictions and compare
compare <- bind_cols(preds, preds.new) 

# compare
select(compare, old.fitted, new.fitted) %>% View(.) # EXACTLY THE SAME!!!!
MH765
  • 390
  • 3
  • 11

1 Answers1

0

When calling broom::augment, note that the newdata= parameter is the third parameter. When you use purr::map2, the values you iterate over are passed in the first two parameters by default. It doesn't matter what you've named those lists that you are passing in. You need to explicitly place the new data in the newdata= parameter.

iris.nest.new <- iris.nest.new %>% 
  mutate(preds = map(model, broom::augment),
         preds.new = map2(model, newdata, ~broom::augment(.x, newdata=.y)))

The difference can be seen running these two commands.

broom::augment(iris.nest.new$model[[1]], iris.nest.new$newdata[[1]])
broom::augment(iris.nest.new$model[[1]], newdata=iris.nest.new$newdata[[1]])
MrFlick
  • 195,160
  • 17
  • 277
  • 295
  • Thanks for the explanation -- this seems to do the trick. I see I didn't consider the order of the arguments carefully enough. Is there a simple way to understand what the tilde before the call to broom:augment is actually doing? – MH765 Aug 01 '20 at 04:48
  • It's how you quickly define an anonymous function with the tidyverse functions. See here: https://stackoverflow.com/questions/44834446/what-is-meaning-of-first-tilde-in-purrrmap – MrFlick Aug 01 '20 at 04:50