1

I have a tibble tib as follows:

  A     B     C     D    
  <chr> <chr> <chr> <chr>
1 X123  X456  K234  V333 
2 X456  Z000  L888  B323 
3 X789  ZZZZ  D345  O999 
4 M111  M111  M111  M111 
.
.
.
(5000 rows)

I also have another vector as follows:

> vec <- c("X123","X456")
> vec
[1] "X123" "X456"

I am looking for a way to search for, and add a logical column (with 5000 rows, in the e.g.) to the right of the tibble that is either TRUE or FALSE depending on whether any of the values of the columns in tib contain a value in vec. My goal output is the following:

  A     B     C     D      lgl
<chr> <chr> <chr> <chr>  <lgl>
1 X123  X456  K234  V333   TRUE
2 X456  Z000  L888  B323   TRUE
3 X789  ZZZZ  D345  O999   FALSE
4 M111  M111  M111  M111   FALSE

I have the following:

> tib %>% 
+   pmap_lgl(~any(..1 %in% vec))
[1]  TRUE  TRUE FALSE FALSE

This gets the results that I am seeking, but I'm a bit confused about the syntax.

Why does the above work (i.e. using ..1), instead of having to use ..1, ..2, ..3, and ..4? My understanding is that pmap generates a vector based on the inputs rowwise, so I assume that ..1 in the above means the vector c("X123","X456","K234","V333") for row #1, c("X456","Z000","L888","B323") for row #2, etc.

In the end, I have two questions:

  1. How do I append this new logical vector to the above tib? I haven't had any luck with:

tib %>% mutate(lgl = pmap_lgl(~any(..1 %in% vec)))

Error in mutate_impl(.data, dots): Evaluation error: argument ".f" is missing, with no default.

  1. If I were to watch to access each column within each row (e.g. "X123" for the first row in pmap), how would I do that within the syntax of purrr?
Will Pike
  • 281
  • 4
  • 17

3 Answers3

2

Keep it simple, you could use base functions apply with any function:

df$lgl <- apply(df, 1, function(x) any(x %in% vec))
YOLO
  • 20,181
  • 5
  • 20
  • 40
2

You can use add_column and pmap_lgl along with a helper function to get a tidyverse one-liner similar to the base apply solution from @YOLO.

library(tidyverse)

df <- tibble(A = c('X123', 'X456','X789', 'M111'),
             B = c('X456', 'Z000', 'ZZZZ', 'M111'),
             C = c('K234', 'L888', 'D345', 'M111'),
             D = c('V333', 'B323', '0999', 'M111'))


vec <- c('V333', '0999')

check <- function(...) {

  any(c(...) %in% vec)

}

add_column(df, row_check = pmap_lgl(df, check))

# A tibble: 4 x 5
  A     B     C     D     row_check
  <chr> <chr> <chr> <chr> <lgl>    
1 X123  X456  K234  V333  TRUE     
2 X456  Z000  L888  B323  FALSE    
3 X789  ZZZZ  D345  0999  TRUE     
4 M111  M111  M111  M111  FALSE    

The caveat of using ... in the function is that it will operate over ALL columns of the provided tibble or data frame. If you have additional columns you'll need to either specify the function arguments or limit the data passed to the pmap_lgl

Jake Kaupp
  • 7,892
  • 2
  • 26
  • 36
1

The ..1, ..2 refers to the number of arguments. We can use these along with the mutate and rowwise functions to get our desired result:

tib %>%
    mutate(lgl = pmap(., ~c(..1, ..2, ..3, ..4) %in% vec)) %>%
    rowwise() %>%
    mutate(lgl = any(unlist(lgl)))

  V1    V2    V3    V4    lgl  
  <chr> <chr> <chr> <chr> <lgl>
1 X123  X456  K234  V333  TRUE 
2 X456  Z000  L888  B323  TRUE 
3 X789  ZZZZ  D345  O999  FALSE
4 M111  M111  M111  M111  FALSE

The call to pmap uses . as its first argument, which is the function we're using. Then we create a vector of the values for each column using c(..1, ..2, ..3, ..4). We need to then use rowwise to calculate the final logical value for each row.

The previous iteration of my answer would have returned an incorrect result for vec = c('M111'), it correctly performs it now:

tib %>%
    mutate(lgl = pmap(., ~c(..1, ..2, ..3, ..4) %in% c('M111'))) %>%
    rowwise() %>%
    mutate(lgl = any(unlist(lgl)))

  V1    V2    V3    V4    lgl  
  <chr> <chr> <chr> <chr> <lgl>
1 X123  X456  K234  V333  FALSE
2 X456  Z000  L888  B323  FALSE
3 X789  ZZZZ  D345  O999  FALSE
4 M111  M111  M111  M111  TRUE 

Here's a link to the documentation for the function, which might be useful too.

bouncyball
  • 10,631
  • 19
  • 31