Drop several factor levels in data frame in R

Question

I needed to drop several factor levels from a data frame in R. With the solution provided in this question, I can get rid of one of them, but... is it possible to remove several factor levels in one effort?

I came up with this piece of code, subsetting as many times as factors needed to remove...

dino <- read.csv('/home/maxim/onset.csv', header=TRUE)
dino <- subset(dino, onset != "QT")
dino <- subset(dino, onset != "")
table(droplevels(dino)$onset)

It works fine in my case, but i was wondering if anyone knows a more direct way to do it. (BTW, I'm not very profficient in R...)

@peixe The way `subset` evaluates its arguments can lead to unexpected results in certain circumstances. If you read `?subset`, you'll note the warning at the end to only use this function interactively, that is, when you're working in the R shell. If what you're writing is going to be part of a script, it's best to stick to the standard `[` notation: `dino[! dino$onset %in% c('QT', ''), ]` — Matthew Plourde, Nov 30 '12 at 15:18
Whoa, I take my hat off to you. Its perfect. Could you post that as a solution to the question? :) Thanks! — peixe, Nov 30 '12 at 15:18
@mplourde Oh, you are right. I hadn't noticed that warning... — peixe, Nov 30 '12 at 15:22
@peixe I think it's kind of weird that they'd keep something in the language that is both redundant and problematic, but such is the case. — Matthew Plourde, Nov 30 '12 at 15:29
@mplourde: `subset` makes your code easier to read, and the problems only occur in fairly obscure circumstances. — Richie Cotton, Dec 01 '12 at 15:52
Wouldn't it be nice to have a solution posted? Easy reputation points for the one who post it! ;) — peixe, Dec 01 '12 at 17:26
@RichieCotton I have to respectfully disagree. Maybe you could say what you mean exactly by 'easier to read'. To me, the concept could mean two different things: clarity of meaning or legibility. On the clarity of meaning front, if I were a newcomer to R and I saw `subset(d, var == 1)`, the natural impression would be that `var` was a variable in the containing environment, which in actuality may or may not be the case. On the other hand, there's no such ambiguity with the bracket and dollar-sign notation. So imo, `subset` falls short if 'easier to read' is understood ... — Matthew Plourde, Dec 01 '12 at 17:34
@RichieCotton [continued] ... as 'clearer in meaning'. Now regarding legibility, the case for `subset` is stronger. I dislike reading R code full of dollar signs just as much as the next person. But that's what the `with` and `within` constructs are for. Additionally, other popular languages also have analogous constructs expressed using the term `with`, so even though `with` may not be as clear in meaning as plain-old brackets and dollar signs, it is in some sense more standard. — Matthew Plourde, Dec 01 '12 at 17:35
@RichieCotton I'll add that the circumstances under which `subset` fails aren't so obscure. See @joran's answer here for an example and a link to discussion: http://stackoverflow.com/questions/9860090/in-r-why-is-better-than-subset — Matthew Plourde, Dec 01 '12 at 17:37

score 2 · Answer 1 · answered Dec 01 '12 at 19:23

2

Solution apported by @Matthew Plourde:

dino[! dino$onset %in% c('QT', ''), ]

answered Dec 01 '12 at 19:23

peixe

1,272
3
14
31

score 0 · Accepted Answer · answered Dec 01 '12 at 19:22

0

Solution apported by @Joris Meys:

subset(dino, ! onset %in% c("QT",""))

answered Dec 01 '12 at 19:22

peixe

1,272
3
14
31

I will choose this answer, as it was the first one, and the one i finally used. See discussion thread above. – peixe Dec 01 '12 at 19:24

Drop several factor levels in data frame in R

2 Answers2