18

I fear greatly that this has been asked and will be downvoted, but I have not found the answer in the docs (?"["), and discovered that it is hard to search for.

data(wines)
# This is allowed:
alcoholic <- wines[, 1]
alcoholic <- wines[, "alcohol"]
nonalcoholic <- wines[, -1]
# But this is not:
fail <- wines[, -"alcohol"]

I know of two solutions, but am frustrated for need of them.

win <- wines[, !colnames(wines) %in% "alcohol"]  # snappy
win <- wines[, -which(colnames(wines) %in% "alcohol")]  # snappier!
a different ben
  • 3,900
  • 6
  • 35
  • 45
  • 2
    Is `snappy` and `snappier` positive or negative measures? I prefer `setdiff` in these cases. What do you expect `-"alcohol"` to return? It doesn't work as a command by itself, so why would it work when trying to subset? – A5C1D2H2I1M1N2O1R2T1 Sep 05 '13 at 10:50
  • 1
    Maybe not an answer to your "Why" in terms on "why has someone chosen to implement it this way", but anyway: from `?[`: "For `[`-indexing only: i, j, ... can be logical vectors (your `!` alternative) [...] can also be negative integers (your `which` alternative). – Henrik Sep 05 '13 at 10:57
  • @AnandaMahto I was being sarcastic, so negative connotations. Expectations of anything in R? I have very few expectations after even my little experience with it :) (That was humour). Can you give an example of how `setdiff` would handle this case? – a different ben Sep 05 '13 at 10:58
  • 1
    if you're just looking for something shorter: `wines[names(wines)!="alcohol"]` – plannapus Sep 05 '13 at 11:00
  • @plannapus Thanks, that's the shortest! Only good for one name though isn't it? I would need to use %in% for a list of names I think. – a different ben Sep 05 '13 at 11:22
  • 4
    `subset(airquality, select = -Temp)` – Henrik Sep 05 '13 at 11:27
  • 2
    Where does the wines data set come from? I get 'not found' (R 2.15, so maybe its new). – Spacedman Sep 05 '13 at 13:37
  • @adifferentben yes indeed you would. For a vector of names it would become `wines[!names(wines)%in%c(...)]`. – plannapus Sep 05 '13 at 13:37
  • 1
    Filling in a useful link from a now-deleted answer by Dieter Menne about a response from Brian Ripley on this topic on the R mailing list: http://r-project.markmail.org/thread/sdg7mopk4towqbm4 – Ben Bolker Sep 05 '13 at 16:40
  • 1
    Or you can just delete the column by reference if `wines` were a `data.table`: `wines[,alcohol:=NULL]`. That's instant no matter how big the data is. So if the data is large it's more efficient than copying every column other than the one you want to delete. If not it doesn't matter really. – Matt Dowle Sep 05 '13 at 17:02
  • 1
    @Spacedman, the wines data set is in the `kohonen` package, and maybe a few others? It's a classic for machine learning examples. You could also get it at the UCI Machine Learning Repository: http://archive.ics.uci.edu/ml/ – a different ben Sep 06 '13 at 05:13
  • [Related](http://stackoverflow.com/q/4835342/271616) – Joshua Ulrich Sep 07 '13 at 01:10

6 Answers6

19

When you do

wines[, -1]

-1 is evaluated before it is used by [. As you know, the - unary operator won't work with object of class character, so doing the same with "alcohol" will lead you to:

Error in -"alcohol" : invalid argument to unary operator

You can add the following to your alternatives:

wines[, -match("alcohol", colnames(wines))]
wines[, setdiff(colnames(wines), "alcohol")]

but you should know about the risks of negative indexing, e.g., see what happens if you mistype "alcool" (sic.) So your first suggestion and the last one here (@Ananda's) should be preferred. You might also want to write a function that will error out if you provide a name that is not part of your data.

flodel
  • 87,577
  • 21
  • 185
  • 223
  • `R> -1` gives `[1] -1`, so how does that work? I am not so familiar with the way R works. Is that what you mean? – a different ben Sep 05 '13 at 11:05
  • I'll have to write a compendium of idioms for deleting a column, thanks for the additions :) – a different ben Sep 05 '13 at 11:07
  • Yes, `-1` is something that evaluates fine, so you can pass it as an argument to the `[` function and it will know what to do with it. On the other side, `-"alcohol"` does not. It has less to do with how `[` is implemented, more with the fact that you cannot compute `-"alcohol"`, hence pass it to `[` or any function. – flodel Sep 05 '13 at 11:10
  • I normally answer these types of questions with `-which()` is evil, and then point the way to `setdiff`. +1 – A5C1D2H2I1M1N2O1R2T1 Sep 05 '13 at 11:16
  • Forgive me I was a little slow in getting your meaning. Thanks @Ananda and flodel. – a different ben Sep 05 '13 at 11:17
  • 1
    For real fun, compare foo[-0] and foo[-c(0,1)] . IIRC flodel discussed zeroes in a SO question a few months back. – Carl Witthoft Sep 05 '13 at 11:29
9

Another possibility:

subset(wines,select=-alcohol)

You can even do

subset(wines,select=-c(alcohol,other_drop))

In fact, if you have a contiguous set of columns you want to drop, you can even

subset(wines,select=-(first_drop:last_drop))

which can be handy (although IMO it depends dangerously on the order of columns, which is something that might be fragile: I might prefer a grep-based solution if there were some way to identify columns, or a more explicit separate definition of column groups).

In this case subset is using non-standard evaluation, which as has been discussed elsewhere can be dangerous in some contexts. But I still like it for simple, top-level data manipulation because of its readability.

Ben Bolker
  • 211,554
  • 25
  • 370
  • 453
  • 1
    The subset function converts the select expression to numbers via a named vector of numbers, which is why the ":" method works. – IRTFM Sep 05 '13 at 17:30
  • 1
    @DWin, `?subset` says `This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like [, and in particular the non-standard evaluation of argument subset can have unanticipated consequences.` Why? What are the non-standard evaluations it refers to. These ones Ben Bolker has listed? – a different ben Sep 06 '13 at 00:26
6

Another method that uses numeric indexing and generalizes to situations where you wnat to remove a bunch of similarly named columns:

dfrm[ , -grep("^val", names(dfrm) )] #remove columns starting with "val"

(I gave my vote to flodel, since his answer described "why" a "minus sign" didn't work. Essentially because the R authors didn't overload the "-" operator for that purpose. They also didn't overload "+" to do concatenation in the manner that some languages did.

IRTFM
  • 258,963
  • 21
  • 364
  • 487
3

How about write a simple little function and stick it in your .Rprofile. Something like...

dropcols <- function( df , cols ){
  out <- df[ , !names(df) %in% cols]
  return( out )
}

#  To use it....
data( mtcars )
head( dropcols( mtcars , "mpg" ) )
#                  cyl disp  hp drat    wt  qsec vs am gear carb
#Mazda RX4           6  160 110 3.90 2.620 16.46  0  1    4    4
#Mazda RX4 Wag       6  160 110 3.90 2.875 17.02  0  1    4    4
#Datsun 710          4  108  93 3.85 2.320 18.61  1  1    4    1
#Hornet 4 Drive      6  258 110 3.08 3.215 19.44  1  0    3    1
#Hornet Sportabout   8  360 175 3.15 3.440 17.02  0  0    3    2
#Valiant             6  225 105 2.76 3.460 20.22  1  0    3    1
Simon O'Hanlon
  • 58,647
  • 14
  • 142
  • 184
  • Yep, that's a useful way to solve it. Not very portable however for others so I'm a little disinclined to do it. I've avoided that sort of thing in general partly for that reason, but also I always forget to sync my work to home machine to laptop, etc, and forget what's in my .Rprofile anyway! – a different ben Sep 05 '13 at 11:02
3

I can't find this in the documentation, but the following syntax works with data.table:

dt = data.table(wines)

dt[, !"alcohol", with = F]

And you can also have a list of columns if you like:

dt[, !c("Country", "alcohol"), with = F]

It was just documented in NEWS for v1.8.4 it seems :

When with=FALSE, "!" may also be a prefix on j, #1384ii. This selects all but the named columns.

DF[,-match("somecol",names(DF))]
# works when somecol exists. If not, NA causes an error.

DF[,-match("somecol",names(DF),nomatch=0)]
# works when somecol exists. Empty data.frame when it doesn't, silently.

DT[,-match("somecol",names(DT)),with=FALSE]
# same issues.

DT[,setdiff(names(DT),"somecol"),with=FALSE]
# works but you have to know order of arguments, and no warning if missing

vs

DT[,!"somecol",with=FALSE]
# works and easy to read. With (helpful) warning if somecol isn't there.

But the above all copy every column other than the deleted one. More usually :

DT[,somecol:=NULL]

to delete the column by name by reference.

Matt Dowle
  • 58,872
  • 22
  • 166
  • 224
eddi
  • 49,088
  • 6
  • 104
  • 155
0

You can get your desired behavior as follows:

data(iris)
str(iris)
delete <- which(colnames(iris) == "Species")
iris2 <- iris[, -delete]
str(iris2)
Bryan Hanson
  • 6,055
  • 4
  • 41
  • 78
  • This is equivalent to matching a single string, as opposed to using `%in%` to match a list of strings. – a different ben Sep 05 '13 at 11:25
  • This could be simpliefied to `deleted <- colnames(iris) == "Species"; iris[!deleted]`. You don't need negative indexing when you got logical vector. – Marek Mar 19 '14 at 06:26