58

I want to select multiple columns based on their names with a regex expression. I am trying to do it with the piping syntax of the dplyr package. I checked the other topics, but only found answers about a single string.

With base R:

library(dplyr)    
mtcars[grepl('m|ar', names(mtcars))]
###                      mpg am gear carb
### Mazda RX4           21.0  1    4    4
### Mazda RX4 Wag       21.0  1    4    4

However it doesn't work with the select/contains way:

mtcars %>% select(contains('m|ar'))
### data frame with 0 columns and 32 rows

What's wrong?

agenis
  • 8,069
  • 5
  • 53
  • 102

4 Answers4

112

You can use matches

 mtcars %>%
        select(matches('m|ar')) %>%
        head(2)
 #              mpg am gear carb
 #Mazda RX4      21  1    4    4
 #Mazda RX4 Wag  21  1    4    4

According to the ?select documentation

‘matches(x, ignore.case = TRUE)’: selects all variables whose name matches the regular expression ‘x’

Though contains work with a single string

mtcars %>% 
       select(contains('m'))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • 1
    Thank you @akrun, i feel stupid now :-). But one question, still: given that, why should we even use contains(), if matches() does the same and even better? – agenis Mar 12 '15 at 19:15
  • @agenis There are several options in `?select` for flexibility of use, I guess. `contains` take a single string, but when you do this regex type matching, it is better to use `matches`... – akrun Mar 12 '15 at 19:17
  • 6
    @agenis Because you might want to match "." and not have to think about how to escape it in a regular expression – hadley Mar 12 '15 at 20:42
  • Is there a way to not have to pipe the matches, suppose I have a character vector of 30 different matches I am looking for, how can I read that in? – Michael Bellhouse Mar 30 '17 at 18:09
  • 3
    @MichaelBellhouse In that case you use `paste` ie. `paste(yourvec, collapse="|")` and use that in `matches` – akrun Mar 30 '17 at 18:10
  • 1
    akrun, thank you so much. I;ve been doing a lot of digging and experimenting for this. All the best. – Michael Bellhouse Mar 30 '17 at 18:15
  • 1
    equivalent_for_filter <- df %>% filter(!grepl(paste(exclude_filter, collapse="|"),variable)) – Michael Bellhouse Apr 23 '17 at 00:24
  • 1
    use `matches('m*.ar')` for "AND" operator – Ömer An Oct 24 '18 at 06:10
  • @titeuf it is a regex code to check either 'm' or 'ar'. If you want both use the code as stated by OmerAn – akrun Sep 14 '20 at 16:17
20

You can use contains from package dplyr, if you give a vector of text options, like this:

mtcars %>% 
       select(contains(c("m", "ar"))
Vinícius Félix
  • 8,448
  • 6
  • 16
  • 32
Nicki Norris
  • 201
  • 2
  • 2
  • Your answer could be improved with additional supporting information. Please [edit] to add further details, such as citations or documentation, so that others can confirm that your answer is correct. You can find more information on how to write good answers [in the help center](/help/how-to-answer). – Community Sep 24 '21 at 09:29
  • 1
    `contains()` with a vector of as many element as you want works just fine. Actually, `matches()` should be reserved for cases where you need complex matching using REGEX – Faustin Gashakamba Sep 25 '22 at 07:38
4

You could still use grepl() from base R.

df <- mtcars[ , grepl('m|ar', names(mtcars))]

...which returns a subset dataframe, df, containing columns with m or ar in the column names

0

here's an alternative

mtcars %>% 
    select(contains('m') | contains('ar')) %>% 
    head(2)

#             mpg am gear carb
# Mazda RX4      21  1    4    4
# Mazda RX4 Wag  21  1    4    4
redantman
  • 11
  • 1