60

I'm familiar with being able to extract columns from an R data frame (or matrix) like so:

df.2 <- df[, c("name1", "name2", "name3")]

But can one use a ! or other tool to select all but those listed columns?

For background, I have a data frame with quite a few column vectors and I'd like to avoid:

  • Typing out the majority of the names when I could just remove a minority
  • Using the much shorter df.2 <- df[, c(1,3,5)] because when my .csv file changes, my code goes to heck since the numbering isn't the same anymore. I'm new to R and think I've learned the hard way not to use number vectors for larger df's that might change.

I tried:

df.2 <- df[, !c("name1", "name2", "name3")]
df.2 <- df[, !=c("name1", "name2", "name3")]

And just as I was typing this, found out that this works:

df.2 <- df[, !names(df) %in% c("name1", "name2", "name3")]

Is there a better way than this last one?

Hendy
  • 10,182
  • 15
  • 65
  • 71
  • 6
    I generally shorten your last example using a custom infix operator: `'%ni%' <- Negate('%in%')`. – joran Aug 31 '12 at 02:20
  • @joran Doesn't that only shorten it by a `!`? Or am I missing something? – Hendy Aug 31 '12 at 05:07
  • 1
    Yes, though most people use () so they'd be saving another 2. Its more about readability for me. – joran Aug 31 '12 at 11:44
  • The last one does not seem to work with data.table's fread. – hhh Jun 30 '17 at 23:50
  • The last one does not work with data.table's fread. For data.table, I got it working with this [here](https://stackoverflow.com/questions/28094645/select-subset-of-columns-in-data-table-r) (with=FALSE). *I wish there was some package-independent solution.* – hhh Jul 01 '17 at 00:04
  • 1
    Use `-c ` instead of `c` . `df.2 <- subset(df, select = -c(name1,name2, name3))` –  Aug 23 '22 at 05:34

6 Answers6

39

An alternative to grep is which:

df.2 <- df[, -which(names(df) %in% c("name1", "name2", "name3"))]
Prasad Khode
  • 6,602
  • 11
  • 44
  • 59
harkmug
  • 2,725
  • 22
  • 26
  • The technique here is otherwise the same as in Data.table but one comma more such that `myData[,,!removeCols]`. – hhh Jul 01 '17 at 00:12
14

You can make a shorter call that is also more generalizable with negative-grep:

df.2 <- df[, -grep("^name[1:3]$", names(df) )] 

Since grep returns numerics you can use the negative vector indexing to remove columns. You could add further number or more complex patterns.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
11

dplyr::select() has several options for dropping specific columns:

library(dplyr)

drop_columns <- c('cyl','disp','hp')
mtcars %>% 
  select(-one_of(drop_columns)) %>% 
  head(2)

              mpg drat    wt  qsec vs am gear carb
Mazda RX4      21  3.9 2.620 16.46  0  1    4    4
Mazda RX4 Wag  21  3.9 2.875 17.02  0  1    4    4

Negating specific column names, the following drops the column "hp" and the columns from "qsec" through "gear":

mtcars %>% 
  select(-hp, -(qsec:gear)) %>% 
  head(2)

              mpg cyl disp drat    wt carb
Mazda RX4      21   6  160  3.9 2.620    4
Mazda RX4 Wag  21   6  160  3.9 2.875    4

You could also negate contains(), starts_with(), ends_with(), or matches():

mtcars %>% 
  select(-contains('t')) %>%
  select(-starts_with('a')) %>% 
  select(-ends_with('b')) %>% 
  select(-matches('^m.+g$')) %>% 
  head(2)

              cyl disp  hp  qsec vs gear
Mazda RX4       6  160 110 16.46  0    4
Mazda RX4 Wag   6  160 110 17.02  0    4
sbha
  • 9,802
  • 2
  • 74
  • 62
  • 1
    Not sure this is clearer/superior than others, but I appreciate updating with new[ish] tools when they come out! Thanks for adding this to keep things fresh. – Hendy Mar 10 '18 at 21:52
5

Old thread, but here's another solution:

df.2 <- subset(df, select=-c(name1, name2, name3))

This was posted in another similar thread (though I can't find it right now). Should be sustainable code in the situation you describe, and is probably easier to read and edit than some of the other options.

mflo-ByeSE
  • 201
  • 3
  • 7
  • 1
    The data.frame approach is otherwise the same as with data.table such that `subset(myData,,!names(myData) %in% removeCols)` where one comma difference, irritatingly similar. But this approach with `select=-c(..)` does not work, ideas why? – hhh Jul 01 '17 at 00:16
  • Hmm, no idea! I don't use data.table's – mflo-ByeSE Jul 02 '17 at 05:37
2

You could make a custom function to do this if you're using it for your own use to manipulate data. I may do something like this:

rm.col <- function(df, ...) {
    x <- substitute(...())
    z <- Trim(unlist(lapply(x, function(y) as.character(y))))
    df[, !names(df) %in% z]
}

rm.col(mtcars, hp, mpg)

The first argument is the dataframe name. the following ... are the names of any columns you wish to remove.

Tyler Rinker
  • 108,132
  • 65
  • 322
  • 519
0

The easiest way that comes to my mind:

filtered_df<-df[, setdiff(names(df),c("name1","name2") ]

essentially you are computing the set difference between full list of column names and the subset you want to filter out (name1 and name2 above).