select column names containing string programmatically

Question

Given a data frame like:

df <- data.frame(z_a = 1:2,
                 z_b = 1:2,
                 y_a = 3:4,
                 y_b = 3:4)

I can select columns names that contain a character with:

library(dplyr)
df %>% select(contains("a"), contains("b"))

  z_a y_a z_b y_b
1   1   3   1   3
2   2   4   2   4

NOTE that the column order has changed. Columns containing a come first before columns containing b

I'd like to select column names that contain characters in a vector and that reorders the columns.

searchfor <- letters[1:2]

Using searchfor, I'd like to make the following expression and use it in a select statement:

E <- quote(contains(searchfor[1]), contains(searchfor[2]))
df %>% select_(E)

This is a slightly different question than https://stackoverflow.com/questions/29018292/select-columns-based-on-multiple-strings-with-dplyr. But It has the same solution. — wibeasley, Jul 09 '17 at 13:58
Here's a more direct comparison: https://stackoverflow.com/questions/25923392/select-columns-based-on-string-match-dplyrselect/25923466#25923466 — wibeasley, Jul 09 '17 at 14:15
@wibeasley given the clarification to my original post, the below answers answer my question more closely than the other posts. Thanks! — CPak, Jul 09 '17 at 17:10

akrun · Accepted Answer · 2017-07-09T14:27:29.847

4

We can do

df %>% 
   select_at(vars(matches(paste(searchfor, collapse="|")))) %>%
   select(order(sub(".*_", "", names(.))))

edited Jul 09 '17 at 14:27

answered Jul 09 '17 at 14:16

akrun

874,273
37
540
662

Not quite the behavior I was looking for. `df %>% select(contains("a"), contains("b"))` changes the order of the columns, which is the output I wanted. I'll make it clear in my post. – CPak Jul 09 '17 at 14:20
Thanks. Now I need to figure out what you did. – CPak Jul 09 '17 at 14:51
@ChiPak In the first `select` I used a regex to extract those column and second remove the substring, order based on that and select the columns. Thanks for your note – akrun Jul 09 '17 at 14:53
the second only works if I want alphabetical ordering, is that right? If I wanted arbitrary ordering (determined by order of `searchfor`), it would not work in that case? – CPak Jul 09 '17 at 14:56
@ChiPak You can add a `factor` with `levels` for a general case – akrun Jul 09 '17 at 14:57
just wanting to make sure I understand what's going on...Thanks – CPak Jul 09 '17 at 14:58

F. Privé · Answer 2 · 2017-07-09T14:49:15.623

2

purrr solution:

library(purrr)
ind_lgl <- map(letters[1:2], ~ grepl(.x, names(df), fixed = TRUE)) %>%
  pmap_lgl(`|`)

df[ind_lgl]

With the pipe:

df %>%
  `[`(map(letters[1:2], ~ grepl(.x, names(df), fixed = TRUE)) %>%
        pmap_lgl(`|`))

If you to get the right order:

rank <- map(letters[1:2], ~ grepl(.x, names(df), fixed = TRUE)) %>%
  pmap(c) %>%
  map(which)


ind_chr <- data_frame(colnames = names(df), rank) %>%
  mutate(l = lengths(rank)) %>%
  filter(l > 0) %>%
  mutate(rank = unlist(map(rank, ~ .x[[1]]))) %>%
  arrange(rank) %>%
  pull(colnames)


df[ind_chr]

But it is not pretty...

edited Jul 09 '17 at 14:49

answered Jul 09 '17 at 14:10

F. Privé

11,423
2
27
78

Not quite the behavior I was looking for. `df %>% select(contains("a"), contains("b"))` changes the order of the columns, which is the output I wanted. Should have made that more clear in my post – CPak Jul 09 '17 at 14:18
Not pretty...but useful for me to study anyways. You've earned my upvote... – CPak Jul 09 '17 at 14:57

score 1 · Answer 3 · answered Jul 09 '17 at 15:53

1

I don't understand the exact requirement, but is this solution.

select(df, matches("a|b"))

answered Jul 09 '17 at 15:53

PIG

599
3
13

Close...two things I wanted. First, use a vector of character elements `searchfor` as arguments to `contains` in `select`. You have not used `searchfor` in your statement. Second, the statements should reorder the columns based on the match, such that the order of `searchfor` should determine the column order of the output. – CPak Jul 09 '17 at 17:08

score 0 · Answer 4 · answered Jul 09 '17 at 15:39

Self answer - here's a solution with select_ and that still uses contains - just in case anyone else is interested:

library(iterators)
library(dplyr)
s <- paste0("c(", paste0(sapply(iter(searchfor), function(x) paste0("contains(\"", x, "\")")), collapse=","), ")")
df %>% select_(., s)

  z_a y_a z_b y_b
1   1   3   1   3
2   2   4   2   4

select column names containing string programmatically

4 Answers4