4

Purpose

Can I select columns using dplyr conditional that the column name is in an external vector. I have found some posts that explain how to subset the data frame using a vector of name, but I could not find one when some of the names in the vector do not exist in the data frame.

Example dataset

  library(tidyverse)
  library(tibble)
  library(data.table)
  
  col_names <- c('a', 'b', 'e')
  
  rename <- dplyr::rename
  select <- dplyr::select
  
  set.seed(10002)
  a <- sample(1:20, 1000, replace=T)
  
  set.seed(10003)
  b <- sample(letters, 1000, replace=T)
  
  set.seed(10004)
  c <- sample(letters, 1000, replace=T)
  
  
  data <-   
    data.frame(a, b, c) 
# I would like to choose a, b that are in col_names vector. 
J.K.
  • 325
  • 2
  • 8

3 Answers3

8

We could use any_of with select

library(dplyr)
data %>%
     select(any_of(col_names))

-output

 a b
1  1 e
2  4 e
3 13 f
4  8 m
5 10 z
6  3 y
...
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you @akrun ! Given my understanding, this requires knowing what columns are not in the `col_names`. Correct? Is there any way we automatically detect it such as select_if? – J.K. Aug 11 '21 at 22:21
  • Also, I am not sure if I understood the solution. My goal is to take out a, b, without c as an output. Your output shows all columns of original data frame. Am I missing something here? – J.K. Aug 11 '21 at 22:24
  • 1
    @J.K. sorry, now I understand what you meant. I updated. I was thinking that you need to create a data.frame construct from the vectors – akrun Aug 11 '21 at 22:26
  • https://github.com/r-lib/tidyselect/issues/269 I used to use this, but I think it is not currently working – Andrés Parada Jan 26 '23 at 17:55
  • @AndrésParada it is working fine for me. I used devel version of dplyr – akrun Jan 26 '23 at 17:57
3

Here is one way to solve your problem:

data[names(data) %in% col_names]

#    a b
# 1  1 e
# 2  4 e
# 3 13 f
# 4  8 m
# 5 10 z
# 6  3 y
# ...
2

We may also use matches:

library(dplyr)
data %>% 
  select(matches(col_names)))

Output:

       a b    
   <int> <chr>
 1     1 e    
 2     4 e    
 3    13 f    
 4     8 m    
 5    10 z    
 6     3 y    
 7    19 g    
 8     7 f    
 9    12 f    
10    15 k    
# … with 990 more rows
TarJae
  • 72,363
  • 6
  • 19
  • 66