1

I needed to drop several factor levels from a data frame in R. With the solution provided in this question, I can get rid of one of them, but... is it possible to remove several factor levels in one effort?

I came up with this piece of code, subsetting as many times as factors needed to remove...

dino <- read.csv('/home/maxim/onset.csv', header=TRUE)
dino <- subset(dino, onset != "QT")
dino <- subset(dino, onset != "")
table(droplevels(dino)$onset)

It works fine in my case, but i was wondering if anyone knows a more direct way to do it. (BTW, I'm not very profficient in R...)

Community
  • 1
  • 1
peixe
  • 1,272
  • 3
  • 14
  • 31
  • 7
    `subset(dino, ! onset %in% c("QT",""))` is the way to go – Joris Meys Nov 30 '12 at 15:10
  • 3
    @peixe The way `subset` evaluates its arguments can lead to unexpected results in certain circumstances. If you read `?subset`, you'll note the warning at the end to only use this function interactively, that is, when you're working in the R shell. If what you're writing is going to be part of a script, it's best to stick to the standard `[` notation: `dino[! dino$onset %in% c('QT', ''), ]` – Matthew Plourde Nov 30 '12 at 15:18
  • Whoa, I take my hat off to you. Its perfect. Could you post that as a solution to the question? :) Thanks! – peixe Nov 30 '12 at 15:18
  • @mplourde Oh, you are right. I hadn't noticed that warning... – peixe Nov 30 '12 at 15:22
  • @peixe I think it's kind of weird that they'd keep something in the language that is both redundant and problematic, but such is the case. – Matthew Plourde Nov 30 '12 at 15:29
  • Open-source issues... xD – peixe Nov 30 '12 at 15:30
  • @mplourde: `subset` makes your code easier to read, and the problems only occur in fairly obscure circumstances. – Richie Cotton Dec 01 '12 at 15:52
  • Wouldn't it be nice to have a solution posted? Easy reputation points for the one who post it! ;) – peixe Dec 01 '12 at 17:26
  • @RichieCotton I have to respectfully disagree. Maybe you could say what you mean exactly by 'easier to read'. To me, the concept could mean two different things: clarity of meaning or legibility. On the clarity of meaning front, if I were a newcomer to R and I saw `subset(d, var == 1)`, the natural impression would be that `var` was a variable in the containing environment, which in actuality may or may not be the case. On the other hand, there's no such ambiguity with the bracket and dollar-sign notation. So imo, `subset` falls short if 'easier to read' is understood ... – Matthew Plourde Dec 01 '12 at 17:34
  • @RichieCotton [continued] ... as 'clearer in meaning'. Now regarding legibility, the case for `subset` is stronger. I dislike reading R code full of dollar signs just as much as the next person. But that's what the `with` and `within` constructs are for. Additionally, other popular languages also have analogous constructs expressed using the term `with`, so even though `with` may not be as clear in meaning as plain-old brackets and dollar signs, it is in some sense more standard. – Matthew Plourde Dec 01 '12 at 17:35
  • @RichieCotton I'll add that the circumstances under which `subset` fails aren't so obscure. See @joran's answer here for an example and a link to discussion: http://stackoverflow.com/questions/9860090/in-r-why-is-better-than-subset – Matthew Plourde Dec 01 '12 at 17:37
  • @peixe you can post and accept – Matthew Plourde Dec 01 '12 at 18:00

2 Answers2

2

Solution apported by @Matthew Plourde:

dino[! dino$onset %in% c('QT', ''), ]
peixe
  • 1,272
  • 3
  • 14
  • 31
0

Solution apported by @Joris Meys:

subset(dino, ! onset %in% c("QT",""))
peixe
  • 1,272
  • 3
  • 14
  • 31
  • I will choose this answer, as it was the first one, and the one i finally used. See discussion thread above. – peixe Dec 01 '12 at 19:24