Selecting columns in R data frame based on those not in a vector

Question

I'm familiar with being able to extract columns from an R data frame (or matrix) like so:

df.2 <- df[, c("name1", "name2", "name3")]

But can one use a ! or other tool to select all but those listed columns?

For background, I have a data frame with quite a few column vectors and I'd like to avoid:

Typing out the majority of the names when I could just remove a minority
Using the much shorter df.2 <- df[, c(1,3,5)] because when my .csv file changes, my code goes to heck since the numbering isn't the same anymore. I'm new to R and think I've learned the hard way not to use number vectors for larger df's that might change.

I tried:

df.2 <- df[, !c("name1", "name2", "name3")]
df.2 <- df[, !=c("name1", "name2", "name3")]

And just as I was typing this, found out that this works:

df.2 <- df[, !names(df) %in% c("name1", "name2", "name3")]

Is there a better way than this last one?

I generally shorten your last example using a custom infix operator: `'%ni%' <- Negate('%in%')`. — joran, Aug 31 '12 at 02:20
@joran Doesn't that only shorten it by a `!`? Or am I missing something? — Hendy, Aug 31 '12 at 05:07
Yes, though most people use () so they'd be saving another 2. Its more about readability for me. — joran, Aug 31 '12 at 11:44
The last one does not work with data.table's fread. For data.table, I got it working with this [here](https://stackoverflow.com/questions/28094645/select-subset-of-columns-in-data-table-r) (with=FALSE). *I wish there was some package-independent solution.* — hhh, Jul 01 '17 at 00:04
Use `-c ` instead of `c` . `df.2 <- subset(df, select = -c(name1,name2, name3))` — , Aug 23 '22 at 05:34

score 39 · Answer 1 · edited Mar 08 '17 at 05:28

39

An alternative to grep is which:

df.2 <- df[, -which(names(df) %in% c("name1", "name2", "name3"))]

edited Mar 08 '17 at 05:28

Prasad Khode

6,602
11
44
59

answered Aug 31 '12 at 18:12

harkmug

2,725
22
26

The technique here is otherwise the same as in Data.table but one comma more such that `myData[,,!removeCols]`. – hhh Jul 01 '17 at 00:12

score 14 · Answer 2 · answered Aug 31 '12 at 04:58

You can make a shorter call that is also more generalizable with negative-grep:

df.2 <- df[, -grep("^name[1:3]$", names(df) )]

Since grep returns numerics you can use the negative vector indexing to remove columns. You could add further number or more complex patterns.

sbha · Answer 3 · 2018-03-29T13:12:58.023

dplyr::select() has several options for dropping specific columns:

library(dplyr)

drop_columns <- c('cyl','disp','hp')
mtcars %>% 
  select(-one_of(drop_columns)) %>% 
  head(2)

              mpg drat    wt  qsec vs am gear carb
Mazda RX4      21  3.9 2.620 16.46  0  1    4    4
Mazda RX4 Wag  21  3.9 2.875 17.02  0  1    4    4

Negating specific column names, the following drops the column "hp" and the columns from "qsec" through "gear":

mtcars %>% 
  select(-hp, -(qsec:gear)) %>% 
  head(2)

              mpg cyl disp drat    wt carb
Mazda RX4      21   6  160  3.9 2.620    4
Mazda RX4 Wag  21   6  160  3.9 2.875    4

You could also negate contains(), starts_with(), ends_with(), or matches():

mtcars %>% 
  select(-contains('t')) %>%
  select(-starts_with('a')) %>% 
  select(-ends_with('b')) %>% 
  select(-matches('^m.+g$')) %>% 
  head(2)

              cyl disp  hp  qsec vs gear
Mazda RX4       6  160 110 16.46  0    4
Mazda RX4 Wag   6  160 110 17.02  0    4

Not sure this is clearer/superior than others, but I appreciate updating with new[ish] tools when they come out! Thanks for adding this to keep things fresh. — Hendy, Mar 10 '18 at 21:52

score 5 · Answer 4 · answered Apr 25 '17 at 00:32

5

Old thread, but here's another solution:

df.2 <- subset(df, select=-c(name1, name2, name3))

This was posted in another similar thread (though I can't find it right now). Should be sustainable code in the situation you describe, and is probably easier to read and edit than some of the other options.

answered Apr 25 '17 at 00:32

mflo-ByeSE

201
3
7

1

The data.frame approach is otherwise the same as with data.table such that `subset(myData,,!names(myData) %in% removeCols)` where one comma difference, irritatingly similar. But this approach with `select=-c(..)` does not work, ideas why? – hhh Jul 01 '17 at 00:16
Hmm, no idea! I don't use data.table's – mflo-ByeSE Jul 02 '17 at 05:37

score 2 · Answer 5 · answered Aug 31 '12 at 02:42

You could make a custom function to do this if you're using it for your own use to manipulate data. I may do something like this:

rm.col <- function(df, ...) {
    x <- substitute(...())
    z <- Trim(unlist(lapply(x, function(y) as.character(y))))
    df[, !names(df) %in% z]
}

rm.col(mtcars, hp, mpg)

The first argument is the dataframe name. the following ... are the names of any columns you wish to remove.

score 0 · Answer 6 · answered Nov 08 '15 at 01:18

0

The easiest way that comes to my mind:

filtered_df<-df[, setdiff(names(df),c("name1","name2") ]

essentially you are computing the set difference between full list of column names and the subset you want to filter out (name1 and name2 above).

answered Nov 08 '15 at 01:18

Ahmed Osman

94
4

Selecting columns in R data frame based on those not in a vector

6 Answers6

Linked

Related

Selecting columns in R data frame based on those *not* in a vector

6 Answers6

Linked

Related

Selecting columns in R data frame based on those not in a vector