1

I am trying to understand how pmap works. The tibble below contains a list-column values. I would like to create a new column New that depends on whether or not the corresponding elements in the values column are NULL. Since is.null is not vectorised I initially thought to use rowwise() before coming across pmap().

Using rowwise() prior to mutate() gives me the desired result as shown below:

tbl = as.data.frame(do.call(rbind, pars)) %>%
  rowwise() %>%
  mutate(New = ifelse(is.null(values), paste(id, default), paste(id, values, collapse=", ")))

> tbl
Source: local data frame [2 x 6]
Groups: <by row>

# A tibble: 2 x 6
  id        lower     upper     values     default   New        
  <list>    <list>    <list>    <list>     <list>    <chr>        
1 <chr [1]> <dbl [1]> <dbl [1]> <NULL>     <dbl [1]> a 5          
2 <chr [1]> <NULL>    <NULL>    <list [3]> <chr [1]> b 0, b 1, b 2

However, pmap() does not:

tbl = as.data.frame(do.call(rbind, pars)) %>%
      mutate(New = pmap(., ~ifelse(is.null(values), paste(id, default), paste(id, values, collapse=", "))))

> tbl
  id lower upper  values default                         New
1  a     1    10    NULL       5 a NULL, b list("0", "1", "2")
2  b  NULL  NULL 0, 1, 2       1 a NULL, b list("0", "1", "2")

It seems to work if I use an anonymous function in place of the tilde:

tbl = as.data.frame(do.call(rbind, pars)) %>%
  mutate(Value = pmap(., function(values, default, id, ...) ifelse(is.null(values), paste(id, default), paste(id, values, collapse=", "))))

> tbl
  id lower upper  values default         Value
1  a     1    10    NULL       5           a 5
2  b  NULL  NULL 0, 1, 2       1 b 0, b 1, b 2

But I don't understand why the tilde version fails? I would prefer not having to specify the arguments in full as I need to map the function over multiple columns. Where am I going wrong?

user51462
  • 1,658
  • 2
  • 13
  • 41
  • Could you add a reproducible example for the above so that it is easy to help? We do not have `pars` objects to begin with. – Ronak Shah Apr 29 '19 at 11:24

1 Answers1

4

I was about to ask a very similar question to this. Basically, asking how to use pmap within mutate without having to use the variable names more than once. Instead, I'll post it as an 'answer' here as it includes a reprex and a number of options that I've found, none of which are completely satisfactory to me. Hopefully somebody else might be able to answer how to do it as required.

I often want to use purrr::pmap inside dplyr::mutate when working with a data.frame with list-columns. Occassionally this involves a lot of repetition of variable names. I'd like to be able to do this more succinctly, using an anonymous function so that the variables are only used once, when passed to pmap's .f argument.

Take this small dataset as an example:

library('dplyr')
library('purrr')

df <- tribble(
  ~x,   ~y,      ~z,         
  c(1), c(1,10), c(1, 10, 100),
  c(2), c(2,20), c(2, 20, 200),
)

Say the function I want to apply to each row is

func <- function(x, y, z){c(sum(x), sum(y), sum(z))}

In practice the function will be more complex, with lots of variables. The function is only needed once, so I'd prefer not to have to name it explicitly and clog up my script and my working environment.

Here are the options. Each creates exactly the same data.frame but in a different way. The reason for including avg will be come clear. Note I'm not considering position matching using ..1, ..2, etc. as this is easy to mess up.

# Explicitly create a function for `.f`.
# This requires using the variable names (x, y, z) three times.
# It's completely clear what it's doing, but needs a lot of typing.
# It might sometimes fail - see https://github.com/tidyverse/purrr/issues/280

df_explicit <- df %>%
  mutate(
    avg = x - mean(x),
    a = pmap(.l = list(x, y, z), .f = function(x, y, z){ c(sum(x), sum(y), sum(z)) })
  )

# Pass the whole of `df` to `.l` and add `...` in an explicit function to deal with any unused columns. 
# variable names are used twice.
# `df` will have to be passes explicitly if not using pipes (eg, `mutate(.data = df, a = pmap(.l = df, ...`).
# This is probably inefficient for large datasets.

df_dots <- df %>%
  mutate(
    avg = x - mean(x),
    a = pmap(.l = ., .f = function(x, y, z, ...){ c(sum(x), sum(y), sum(z)) })
  )

# Use `pryr::f` (as discussed in https://stackoverflow.com/a/51123520/4269699).
# Variable names are used twice.
# Potentially unexpected behaviour.
# Not obvious to the casual reader why the extra `pryr::f` is needed and what it's doing

df_pryrf <- df %>%
  mutate(
    avg = x - mean(x),
    a = pmap(.l = list(x,y,z), .f = pryr::f({c(sum(x), sum(y), sum(z))} ))
  )

# Use `rowwise()` similar to this: https://stackoverflow.com/a/47734073/4269699
# Variable names are used once.
# It will mess up any vectorised functions used elsewhere in mutate, hence the two `mutate()`s

df_rowwise <- df %>%
  mutate( avg = x - mean(x) ) %>%
  rowwise() %>%
  mutate( a = list( {c(sum(x), sum(y), sum(z))} ) ) %>%
  ungroup()

# Use Romain Francois' neat {rap} package.
# Variable names used once.
# Like `rowwise()` it will mess up any vectorised functions so it needs two `mutate()`s for this particular problem
#

library('rap') #devtools::install_github("romainfrancois/rap")
df_rap <- df %>%
  mutate( avg = x - mean(x) ) %>%
  rap( a = ~ c(sum(x), sum(y), sum(z)) )

# Another solution discussed here https://stackoverflow.com/a/51123520/4269699 doesn't seem to work inside `mutate()`, but maybe could be tweaked?
# Like the `pryr::f` solution, it's not immediately obvious what the purpose of the `with(list(...` bit is.

df_with <- df %>%
  mutate(
    avg = x-mean(x),
    a = pmap(.l = list(x,y,z), .f = ~with(list(...), { c(sum(x), sum(y), sum(z))} ))
  )

As far as I know these are the options, excluding position matching.

Ideally, something like the following would be possible, where the function qmap knows to find (rowwise) variables x, y, and z from the object passed to mutates .data argument.

df_new <- df %>%
  mutate(
    avg = x-mean(x),
    a = qmap( ~c(sum(x), sum(y), sum(z)) )
  )

But I don't know how to do this, so consider this only a partial answer.

Related issues:

bretauv
  • 7,756
  • 2
  • 20
  • 57
wjchulme
  • 1,928
  • 1
  • 18
  • 28