0

I have a dataframe with different variables. For example:

 x10 <- c(1, 2, 3)
 x11 <- c(3, 2, 1)
 x12 <- c(1, 2, 3)

 y05_p <- c(5, 6, 7)
 y06_p <- c(4, 5, 6)
 y07_p <- c(3, 4, 5)

 dat <- data.frame(x10, x11, x12, y05_p, y06_p, y07_p)

 > dat
   x10 x11 x12 y05_p y06_p y07_p
 1   1   3   1     5     4     3
 2   2   2   2     6     5     4
 3   3   1   3     7     6     5

Now i would like to drop some variables, but with specific conditions: For example, all variables called "x", no matter what number following. In other words: I want to use a "placeholder", to drop every variable, that includes "x" in the name.

Using subset, this could look like:

 dat <- subset(dat, select = -c(x*))

Here, the "*" is the placeholder.

Or just with "select":

 dat <- select(dat, -x*)

The result should look like:

 dat <- select(dat, -x*)

 > dat
   y05_p y06_p y07_p
 1     5     4     3
 2     6     5     4
 3     7     6     5

Or to work with another example:

 dat <- select(dat, -y*_p)

 > dat
   x10 x11 x12
 1   1   3   1
 2   2   2   2
 3   3   1   3

I am grateful for any help.

C.F.
  • 294
  • 1
  • 11

2 Answers2

4

Use grep and its argument invert = TRUE

placeholder <- "x"
idx <- grep(pattern = placeholder, names(dat), invert = TRUE)
dat[idx]
  y05_p y06_p y07_p
1     5     4     3
2     6     5     4
3     7     6     5

If the pattern is that you want to exclude columns that start with "x" use startsWith

idx <- !startsWith(names(dat), prefix = placeholder)
markus
  • 25,843
  • 5
  • 39
  • 58
1

Use starts_with().

library(dplyr)
dat %>% select(-starts_with("x"))

There are other functions like this (ends_with, matches, contains, one_of). And if everything else fails, you can always use regular expressions and base R:

dat <- dat[ , !grepl("^x", colnames(dat)) ]

Explanation: grepl returns a logical vector. The regular expression "^x" matches anything that starts with an x. This is matched against the column names of dat. We negate the logical vector with the bang (!) and thus select everything that does not match our regex.

January
  • 16,320
  • 6
  • 52
  • 74
  • For those without all of `tidyverse` installed, `library(dplyr)` is all that is needed for the first code block. – r2evans Jul 15 '19 at 19:09
  • Not really, AFAIK the pipe operator is exported from `magrittr`. – January Jul 15 '19 at 19:10
  • 3
    `magrittr::%>%` is re-exported by `dplyr`, so `library(dplyr)` alone (without `library(magrittr)`) suffices. (Clarification: I said `library(dplyr)` is needed ... I did not say `dplyr` is the only package needed, though technically you cannot have `dplyr` without `magrittr`.) – r2evans Jul 15 '19 at 19:15
  • 1
    I think I see your point, but `tidyverse` not being installed has nothing to do with `dplyr` availability. Mostly it's a "style" thing, so it's just my suggestion. (It is *just a little bit* "mwe" thing, something we encourage in our questions. But mostly style, most definitely not a requirement.) – r2evans Jul 15 '19 at 22:19