Subset with vector specifying columns to drop

Question

Let's say, we have a simple data frame like

df <-read.table(text="
colA colB colC colD
1    2    3    4
5    6    7    8
",header=TRUE,sep="")

It has often been explained that one can store the names of columns to be kept in a vector itself:

rows_to_select <- c("colA", "colB")

Subsetting with subset(df, select=rows_to_select) yields the expected outcome.

But why can't I simply invert the keep-sign by putting a minus in front, i.e. subset(df, select=-rows_to_select)? It gives the error Error in -keep : invalid argument to unary operator Calls: subset -> subset.data.frame -> eval -> eval.

However, subset(df, select=-c(colA, colB)) works. Do I always have to employ setdiff, e.g. keep <- setdiff(names(df), rows_to_select) so that I can subset(df, select=keep)?

@rawr That doesn't seem to work with `rows_to_select <- c("colB", "colC")` — MrFlick, Aug 30 '14 at 00:01
ya.. one would need to `rows_to_select <- factor(c("colB", "colC"), levels = colnames(df))` that if one so desired @MrFlick — rawr, Aug 30 '14 at 02:07

Rich Scriven · Answer 1 · 2014-08-30T00:06:41.087

2

You won't be able to use a minus sign with a character vector. But you can use one with a numeric index vector. Furthermore, you'd be better-off using [-type subsetting.

To get an index, we can use which.

> rows <- c("colA", "colB")
> df[, -which(names(df) %in% rows)]
#   colC colD
# 1    3    4
# 2    7    8

edited Aug 30 '14 at 00:06

answered Aug 29 '14 at 21:48

Rich Scriven

97,041
11
181
245

Why is this better? Is it faster? – MERose Aug 30 '14 at 21:14
@user3621464 - It's not better, not worse. They time about the same. I was just trying to point out why you were having problems with negative indexing on a character vector. – Rich Scriven Aug 30 '14 at 21:25
I'm sorry, I wasn't precise enough. I referred to your part "you'd be better-off using [-type subsetting." Why am I better off using `[` instead of `$`? – MERose Sep 13 '14 at 09:24
I meant as opposed to using `subset` – Rich Scriven Sep 13 '14 at 16:38
Ah okay. But why is `[` better than `subset()`? – MERose Sep 18 '14 at 20:47
1

@user3621464 - I will refer you to [this question](http://stackoverflow.com/questions/9860090/in-r-why-is-better-than-subset) for an answer to that question. – Rich Scriven Sep 18 '14 at 20:49

score 1 · Answer 2 · answered Aug 30 '14 at 01:20

The dplyr package offers your way of subsetting data.

v1 <- 1:10
v2 <- 11:20
v3 <- rep(c("ana", "bob"), each = 5)
v4 <- letters[1:10]

foo <- data.frame(v1,v2,v3, v4, stringsAsFactors=F)

# Remove column v2 and v3
select(foo, -c(v2:v3))

#   v1 v4
#1   1  a
#2   2  b
#3   3  c
#4   4  d
#5   5  e
#6   6  f
#7   7  g
#8   8  h
#9   9  i
#10 10  j

score 0 · Answer 3 · answered Aug 29 '14 at 21:04

There are several different ways you could accomplish this, and you are not limited to just the subset function. For example,

Df <- data.frame(
  colA=1:4,
  colB=5:8,
  colC=9:12,
  colD=13:16)
##
rows_to_select <- c("colA", "colB")
##
> Df[,!(names(Df) %in% rows_to_select)]
  colC colD
1    9   13
2   10   14
3   11   15
4   12   16

Subsetting data.frames using [ is also more efficient than calling subset(). But to address your question of

why can't I simply invert the keep-sign by putting a minus in front

that is just a result of R's language structure.

Subset with vector specifying columns to drop

3 Answers3