Using pmap to iterate over rows of a tibble

Question

I have a very simple tibble and I would like to iterate over its rows to apply a function using pmap function. I think I may have misinterpreted some points on pmap function but I mostly have difficulty selecting arguments. So I would like to know whether I should use rowwise function in this case with pmap or not. However I haven't seen a case. The other problem is the selection of variables to iterate over using list or select function:

# Here is my tibble
# Imagine I would like to apply a `n_distinct` function with pmap on it every rows

df <-  tibble(id = c("01", "02", "03","04","05","06"),
                  A = c("Jan", "Mar", "Jan","Jan","Jan","Mar"),
                  B = c("Feb", "Mar", "Jan","Jan","Mar","Mar"),
                  C = c("Feb", "Mar", "Feb","Jan","Feb","Feb")
)

# It is perfectly achievable with `rowwise` and `mutate` and results in my desired output

df %>%
  rowwise() %>%
  mutate(overal = n_distinct(c_across(A:C)))

# A tibble: 6 x 5
# Rowwise: 
  id    A     B     C     overal
  <chr> <chr> <chr> <chr>  <int>
1 01    Jan   Feb   Feb        2
2 02    Mar   Mar   Mar        1
3 03    Jan   Jan   Feb        2
4 04    Jan   Jan   Jan        1
5 05    Jan   Mar   Feb        3
6 06    Mar   Mar   Feb        2

# But with `pmap` it won't. 


df %>%
  select(-id) %>%
  mutate(overal = pmap_dbl(list(A, B, C), n_distinct))


# A tibble: 6 x 4
  A     B     C     overal
  <chr> <chr> <chr>  <dbl>
1 Jan   Feb   Feb        1
2 Mar   Mar   Mar        1
3 Jan   Jan   Feb        1
4 Jan   Jan   Jan        1
5 Jan   Mar   Feb        1
6 Mar   Mar   Feb        1

I just need a little bit of explanation on the application of pmap for rowwise iteration on tibbles, so I highly appreciate any help in advance, thank you.

If you want to use `pmap()` then you will need to vectorise each row beforehand. — Johnny, Mar 27 '21 at 16:51
I do not think that the input to n_distinct looks like this inside `pmap` actually. You can check it with `debugonce(n_distinct)` — mnist, Mar 27 '21 at 17:01
I think you're right. I've removed my initial answer so as not to mislead anybody. Thanks. — Johnny, Mar 27 '21 at 17:13

score 5 · Accepted Answer · answered Mar 27 '21 at 16:48

5

I was able to track down the issue yet cannot say whether it's a bug or a feature here. The point is that n_distinct() inside pmap handles the given input as a data frame with 3 columns. When applying n_distinct() to a data frame it counts the number of distinct rows, hence the 1 in each row

n_distinct(tibble(a = c(1, 2, 2),
                  b = 3))
#> [1] 2

The trick is to convert the input to a vector first and then pass it to n_distinct

df %>%
  select(-id) %>%
  mutate(overal = pmap_dbl(list(A, B, C), ~ n_distinct(c(...))))
#> # A tibble: 6 x 4
#>   A     B     C     overal
#>   <chr> <chr> <chr>  <dbl>
#> 1 Jan   Feb   Feb        2
#> 2 Mar   Mar   Mar        1
#> 3 Jan   Jan   Feb        2
#> 4 Jan   Jan   Jan        1
#> 5 Jan   Mar   Feb        3
#> 6 Mar   Mar   Feb        2

answered Mar 27 '21 at 16:48

mnist

6,571
1
18
41

Thank you very much! It was a very subtle point and I was so irritated so as to why I cannot make it work! I once came across `c(...)` trick but didn't take it seriously but now I think that's the case with some other functions as well. Thank you very much. – Anoushiravan R Mar 27 '21 at 16:56
I have a question here dear @mnist when we apply a function on every rows of a data frame using `pmap` do we need to use `rowwise` or it will automatically applies it on every row instead? cause I have never seen an example of `rowwise` `pmap` combination they mostly combine `rowwise` with `mutate`. – Anoushiravan R Mar 27 '21 at 17:08
2

The [manual](https://purrr.tidyverse.org/reference/map2.html) answers this. *Note that a data frame is a very important special case, in which case pmap() and pwalk() apply the function .f to each row.* – Sirius Mar 27 '21 at 17:35
1

No you don't need to combine `pmap` with `rowwise` – mnist Mar 27 '21 at 17:52
Thank you very much indeed, I really appreciate your help. I guess I have to reread the manuals again. – Anoushiravan R Mar 27 '21 at 19:41

Using pmap to iterate over rows of a tibble

1 Answers1

Linked