4

There are plenty of posts on using dplyr's select_if for multiple conditions. However, in whatever way, selecting for both is.factor and variable names has not worked for me so far.

Ultimately, I would like to select all factors in a df/tibble and exclude certain variables by name.

Example:

df <- tibble(A = factor(c(0,1,0,1)), 
             B = factor(c("Yes","No","Yes","No")), 
             C = c(1,2,3,4))

Various attempts:

Attempt 1

df %>%
  select_if(function(col) is.factor(col) & !str_detect(names(col), "A"))

Error in selected[[i]] <- .p(.tbl[[tibble_vars[[i]]]], ...) : replacement has length zero

Attempt 2

df %>%
      select_if(function(col) is.factor(col) & negate(str_detect(names(col)), "A"))

Error: Can't convert a logical vector to function Call `rlang::last_error()` to see a backtrace

Attempt 3

df %>%
  select_if(function(col) is.factor(col) && !str_detect(names(col), "A"))

Error: Only strings can be converted to symbols Call `rlang::last_error()` to see a backtrace

Attempt 4

df %>%
  select_if(is.factor(.) && !str_detect(names(.), "A"))

Error in tbl_if_vars(.tbl, .predicate, caller_env(), .include_group_vars = TRUE) : length(.p) == length(tibble_vars) is not TRUE

In the meanwhile, individual conditions have no problem working:

> df %>%
+     select_if(is.factor)
# A tibble: 4 x 2
  A     B    
  <fct> <fct>
1 0     Yes  
2 1     No   
3 0     Yes  
4 1     No   

> df %>%
+     select_if(!str_detect(names(.), "A"))
# A tibble: 4 x 2
  B         c
  <fct> <dbl>
1 Yes       1
2 No        2
3 Yes       3
4 No        4

The problem probably lies here:

df %>%
  select_if(function(col) !str_detect(names(col), "A"))

Error in selected[[i]] <- .p(.tbl[[tibble_vars[[i]]]], ...) : replacement has length zero

However, I have little clue how to fix this.

Svencken
  • 479
  • 6
  • 14
  • Use %>% operator: df %>% dplyr::select_if(is.factor) %>% dplyr::select(-one_of('A')) – Ika8 Dec 18 '18 at 16:50
  • I don't think things have changed since [this question/answer](https://stackoverflow.com/questions/48032969/dplyrselect-if-can-use-colnames-and-their-values-at-the-same-time). Also see [this](https://stackoverflow.com/questions/39592879/r-dpylr-select-if-with-multiple-conditions). – aosmith Dec 18 '18 at 16:58
  • I tried the solutions in the posts you mention, but they don't seem to work for this particular problem. I don't know why. – Svencken Dec 18 '18 at 17:08

2 Answers2

0

Perhaps I'm missing something, but is there any reason you couldn't do the following:

df <- tibble(A = factor(c(0,1,0,1)), 
         B = factor(c("Yes","No","Yes","No")), 
         C = c(1,2,3,4))


df %>% select_if(function(col) is.factor(col)) %>% select_if(!str_detect(names(.), "A"))

# A tibble: 4 x 1
B    
<fct>
1 Yes  
2 No   
3 Yes  
4 No   
DaveArmstrong
  • 18,377
  • 2
  • 13
  • 25
  • Yes this is indeed an obvious solution. However, I was hoping to compress it in a single `select_if` statement. This is because knowing how to do this would also be useful for `summarise_if`, for which your proposed solution may be more awkward. – Svencken Dec 18 '18 at 17:07
0

Just for completeness, not sure if it is acceptable for you, but base R may save you some pain here (a first, very quick shot):

df[, sapply(names(df), 
  function(coln, df) !grepl("A", coln) && is.factor(df[[coln]]), df = df),
  drop = FALSE]
Jozef
  • 2,617
  • 14
  • 19