Predict with linear models from a table using a new dataframe

Question

I've seen some examples about using tidy, dplyr and purrr to use linear regressions from tables to predict one value. Instead of predicting only one value, I would like to predict a whole new dataframe. So I have the next data:

library(tidyverse)
y   <- rep(seq(0, 240, by = 40), each = 7) 
x   <- rep(1:7, times = 7)
vol <- c(300, 380, 430, 460, 480, 485, 489,
         350, 445, 505, 540, 565, 580, 585,
         380, 490, 560, 605, 635, 650, 655,
         400, 525, 605, 655, 690, 710, 715,
         415, 555, 655, 710, 740, 760, 765,
         420, 570, 680, 740, 775, 800, 805,
         422, 580, 695, 765, 805, 830, 835) 
df  <- as.data.frame(cbind(y, x, vol))

Which I used to create models like this:

df.1 <- df %>%
  group_by(y) %>%
  do(mod = lm(vol ~ poly(x, 5), data = .))

df.1 looks like this:

# A tibble: 7 x 2
      y mod     
* <int> <list>  
1     0 <S3: lm>
2    40 <S3: lm>
3    80 <S3: lm>
4   120 <S3: lm>
5   160 <S3: lm>
6   200 <S3: lm>
7   240 <S3: lm>

Now I would like to use a new data frame and use the models above to predict the new values of vol

newx <- data.frame(x = seq(1, 7, 0.001))

Update: The answer is supposed to be 7 tables with dimensions 6001x2 with x values from 1 to 7 by 0.001 and 'vol' values with the prediction from the 'x's.

Possible duplicate of [using lm in list column to predict new values using purrr](https://stackoverflow.com/questions/44709870/using-lm-in-list-column-to-predict-new-values-using-purrr) — RLave, Sep 05 '18 at 06:49
Don't do `as.data.frame(cbind(...))`, just use `data.frame(...)`. The former converts to a matrix first, so you'll lose type data to coercion if you have variables of differing types. — alistaire, Sep 05 '18 at 07:07

score 1 · Accepted Answer · answered Sep 05 '18 at 07:14

To use list columns, iterate over them with purrr::map (or lapply) or variants. Expand columns with tidyr::unnest when you want.

library(tidyverse)
df <- data_frame(y = rep(seq(0, 240, by = 40), each = 7), 
                 x = rep(1:7, times = 7), 
                 vol = c(300, 380, 430, 460, 480, 485, 489,
                         350, 445, 505, 540, 565, 580, 585,
                         380, 490, 560, 605, 635, 650, 655,
                         400, 525, 605, 655, 690, 710, 715,
                         415, 555, 655, 710, 740, 760, 765,
                         420, 570, 680, 740, 775, 800, 805,
                         422, 580, 695, 765, 805, 830, 835))

df.1 <- df %>%
    nest(-y) %>%
    mutate(mods = map(data, ~lm(vol ~ poly(x, 5), data = .x)), 
           preds = map(mods, predict, newdata = data.frame(x = seq(1, 7, 0.001))))

df.1
#> # A tibble: 7 x 4
#>       y data             mods     preds        
#>   <dbl> <list>           <list>   <list>       
#> 1     0 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>
#> 2    40 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>
#> 3    80 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>
#> 4   120 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>
#> 5   160 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>
#> 6   200 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>
#> 7   240 <tibble [7 × 2]> <S3: lm> <dbl [6,001]>

How to make a dataframe using the prediction for each y as a column? i.e. a data frame with 6001 columns and 7 columns. — Rolando Montenegro Muro, Sep 05 '18 at 07:24
`df.1 %>% select(y, preds) %>% mutate(y = paste0("preds_", y)) %>% spread(y, preds) %>% unnest()` or maybe `df.1 %>% mutate(preds = set_names(preds, paste0('preds_', y))) %>% pull(preds) %>% as_tibble()`, though long form is usually more useful — alistaire, Sep 05 '18 at 07:31

score 1 · Answer 2 · answered Sep 05 '18 at 08:10

Another option is to use the augment-function from broom:

library(tidyverse)
library(broom)

tibble(y = df.1$y, 
       predictions = map(df.1$mod, augment, newdata = newx)) %>% 
  unnest() %>% 
  select(y, x, vol = .fitted) %>% 
  spread(y, vol)


#  A tibble: 6,001 x 8
#         x   `0`  `40`  `80` `120` `160` `200` `240`
#     <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl>
#  1  1     300.  350.  380.  400.  415.  420.  422.
#  2  1.00  300.  350.  380.  400.  415.  420.  422.
#  3  1.00  300.  350.  380.  400.  415.  420.  422.
#  4  1.00  300.  350.  380.  400.  415.  420.  423.
#  5  1.00  300.  350.  381.  401.  416.  421.  423.
#  6  1.00  300.  351.  381.  401.  416.  421.  423.
#  7  1.01  301.  351.  381.  401.  416.  421.  423.
#  8  1.01  301.  351.  381.  401.  416.  421.  423.
#  9  1.01  301.  351.  381.  401.  416.  421.  423.
# 10  1.01  301.  351.  381.  401.  416.  421.  424.
# ... with 5,991 more rows

How can I find, for example, 'for which x, columns '0','40','80'... is more similar to 350'. I used filter(abs(vol-350)==min(abs(vol-350))) but the problem is that for column 80 and so on no value is supossed to be taken but as the min is the first value, x is saved as 1. — Rolando Montenegro Muro, Sep 21 '18 at 05:36
I'm not exactly sure, what your trying to accommplish.... Maybe you'd like to ask a new question to clarify? — kath, Sep 21 '18 at 06:26

Predict with linear models from a table using a new dataframe

2 Answers2