15

I understand how to use map to iterate over arguments in a df and create a new list column.

For example,

params <- expand.grid(param_a = c(2, 4, 6)
                  ,param_b = c(3, 6, 9)
                  ,param_c = c(50, 100)
                  ,param_d = c(1, 0)
                  )

df.preprocessed <- dplyr::as.tbl(params) %>%
  dplyr::mutate(test_var = purrr::map(param_a, function(x){
      rep(5, x)
      }
    ))

However, how do I use the analogous syntax with pmap in the event that I want to specify more than 2 parameters?

df.preprocessed <- dplyr::as.tbl(params) %>%
  dplyr::mutate(test_var = purrr::pmap(list(x = param_a
                                     ,y = param_b
                                     ,z = param_c
                                     ,u = param_d), function(x, y){
                                        rep(5,x)*y
                                     }
  )
  )

Error output:

Error in mutate_impl(.data, dots) : Evaluation error: unused arguments (z = .l[[c(3, i)]], u = .l[[c(4, i)]]).

matsuo_basho
  • 2,833
  • 8
  • 26
  • 47

4 Answers4

20

With pmap, the first argument is a list, so you can pass it your data frame directly, and then name your arguments in your function with the same names as the columns in your data frame. You'll need unnest() to unpack the list elements returned by pmap():

df.preprocessed <- dplyr::as.tbl(params) %>%
    dplyr::mutate(test_var = purrr::pmap(., function(param_a, param_b, ...){
                                        rep(5, param_a) * param_b
                                     })) %>%
    tidyr::unnest()


> df.preprocessed
# A tibble: 144 x 5
   param_a param_b param_c param_d test_var
     <dbl>   <dbl>   <dbl>   <dbl>    <dbl>
 1       2       3      50       1       15
 2       2       3      50       1       15
 3       4       3      50       1       15
 4       4       3      50       1       15
 5       4       3      50       1       15
 6       4       3      50       1       15
 7       6       3      50       1       15
 8       6       3      50       1       15
 9       6       3      50       1       15
10       6       3      50       1       15
# ... with 134 more rows
fmic_
  • 2,281
  • 16
  • 23
7

How about using rowwise and mutate directly without map:

my_fun <- function(param_a, param_b){
  rep(5, param_a) * param_b
}
df.preprocessed <- dplyr::as.tbl(params) %>%
  rowwise() %>% 
  dplyr::mutate(test_var = list(my_fun(param_a, param_b))) %>% 
  tidyr::unnest()
danilinares
  • 1,172
  • 1
  • 9
  • 28
  • That’s a great answer. `rowwise` is different from `group_by`, as it unwraps list columns: `tibble(test = list(1, 2)) %>% rowwise() %>% mutate(cls = class(test))` shows that `cls` is numeric, not `list` – flying sheep Jul 29 '19 at 14:43
2

We could try

f1 <- function(x, y, ...) rep(5, x)*y

df.preprocessed <- dplyr::as.tbl(params) %>%
        dplyr::mutate(test_var = purrr::pmap(list(x = param_a
                                 ,y = param_b
                                 ,z = param_c
                                 ,u = param_d),f1
    )
   )
df.preprocessed
# A tibble: 36 x 5
#   param_a param_b param_c param_d  test_var
#     <dbl>   <dbl>   <dbl>   <dbl>    <list>
# 1       2       3      50       1 <dbl [2]>
# 2       4       3      50       1 <dbl [4]>
# 3       6       3      50       1 <dbl [6]>
# 4       2       6      50       1 <dbl [2]>
# 5       4       6      50       1 <dbl [4]>
# 6       6       6      50       1 <dbl [6]>
# 7       2       9      50       1 <dbl [2]>
# 8       4       9      50       1 <dbl [4]>
# 9       6       9      50       1 <dbl [6]>
#10       2       3     100       1 <dbl [2]>
# ... with 26 more rows
akrun
  • 874,273
  • 37
  • 540
  • 662
2

You can do this:

df.preprocessed <- dplyr::as.tbl(params) %>%
  dplyr::mutate(test_var = purrr::pmap(list(x = param_a
                                            ,y = param_b
                                            ,z = param_c
                                            ,u = param_d),
                                              ~ rep(5,.x)*.y                                                
  )
  )

or

df.preprocessed <- dplyr::as.tbl(params) %>%
  dplyr::mutate(test_var = purrr::pmap(list(x = param_a
                                            ,y = param_b
                                            ,z = param_c
                                            ,u = param_d),
                                       ~ rep(5,..1)*..2                                       
  )
  )

The second way is more general as you can use ..3, ..4 etc...

moodymudskipper
  • 46,417
  • 11
  • 121
  • 167
  • Thanks. I like the ..3 ..4 syntax but it appears it only works when we're using the shortcut way of writing a formula. I actually have a multi-line formula, and so need to use function(x, y, z, u) where you use the tilda. – matsuo_basho Oct 02 '17 at 19:06
  • I don't understand, can you share your formula ? – moodymudskipper Oct 02 '17 at 19:09
  • `df.test <- dplyr::as.tbl(params) %>% dplyr::mutate(test_var = purrr::pmap(list(x = param_a ,y = param_b ,z = param_c ,u = param_d), function(x,y,z,u){ rep(5,..1)*..2 } ) )` – matsuo_basho Oct 02 '17 at 19:44
  • This won't work indeed, but I don't understand in which circumstances you wouldn't be able to use the short form I proposed. – moodymudskipper Oct 02 '17 at 19:54
  • I've never had luck with using purrr's tilde shortcut whenever the formula has more than 1 line. In my case, the formula has about 10. Am I wrong on this? – matsuo_basho Oct 02 '17 at 19:57
  • that's why I asked you to share your formula :). On my side I have no problem inserting new lines in my formula. Maybe doucble check. It could be related to versions, then try wrapping the formula in `{}` – moodymudskipper Oct 02 '17 at 20:03
  • `df.test <- dplyr::as.tbl(params) %>% dplyr::mutate(test_var = purrr::pmap(list(x = param_a ,y = param_b ,z = param_c ,u = param_d), ~ rep(5,..1)*..2 %>% runif(min=..3, max=..4) ) )` – matsuo_basho Oct 02 '17 at 20:12
  • with `function/x/y/z/u` I get the same result than with `~/..1 etc` , with some warnings in both cases, but no error – moodymudskipper Oct 02 '17 at 20:17