8

There is an interesting option drop = TRUE in data.frame filtering, see excerpt from help('[.data.frame'):

Usage

S3 method for class 'data.frame'

x[i, j, drop = ]

But when I try it on data.frame, it doesn't work!

> df = data.frame(a = c("europe", "asia", "oceania"), b = c(1, 2, 3))
>
> df[1:2,, drop = TRUE]$a
[1] europe asia  
Levels: asia europe oceania     <--- oceania shouldn't be here!!
>

I know there are other ways like

df2 <- droplevels(df[1:2,])

but the documentation promised much more elegant way to do this, so why it doesn't work? Is it a bug? Because I don't understand how this could be a feature...

EDIT: I was confused by drop = TRUE dropping factor levels for vectors, as you can see here. It is not very intuitive that [i, drop = TRUE] drops factor levels and [i, j, drop = TRUE] does not!!

tonytonov
  • 25,060
  • 16
  • 82
  • 98
Tomas
  • 57,621
  • 49
  • 238
  • 373
  • 6
    I think you need to go back and actually read the documentation you link to. Also, it is sufficient to do `droplevels(df[1:2,])` in one line. – joran Jan 02 '13 at 14:33
  • 3
    Thanks to @joran and you all for explanations. But, **is it a reason for downvote if someone doesn't understand the documentation?** (I was confused by drop = TRUE working for vectors, see my EDIT). Now I might be tempted to delete quite interesting question with answers.. – Tomas Jan 02 '13 at 14:42
  • Who says I downvoted? In any case, if the documentation were in any way confusing or ambiguous, I think you might have a point. Otherwise, I think "lack of research" would apply in this case. – joran Jan 02 '13 at 14:45
  • 2
    @Tomas: I agree with you (I didn't downvote), anyway SO community tend to not appreciate very much when people seem to not have read the documentation carefully... (it's a fierce world here ;) ) – digEmAll Jan 02 '13 at 14:47
  • @joran, I didn't meant you particularly. I was confused by drop = TRUE dropping factor levels for vectors, see my EDIT! – Tomas Jan 02 '13 at 14:48
  • 3
    you could add an answer with your final observation (which I agree is weird) to the list at http://stackoverflow.com/questions/1535021/whats-the-biggest-r-gotcha-youve-run-across – Ben Bolker Jan 02 '13 at 14:56

4 Answers4

12

The documentation clearly states:

drop : logical. If TRUE the result is coerced to the lowest possible dimension. The default is to drop if only one column is left, but not to drop if only one row is left.

This means that if drop = TRUE and the filtered data.frame results in a single column or row, the result is coerced to a vector/list instead of returning a single-column/single-row data.frame.

Therefore, this argument has no relation with levels dropping, and so the right way to eliminate exceeding levels is the one you mentioned (i.e. using droplevels function).

digEmAll
  • 56,430
  • 9
  • 115
  • 140
  • 1
    Thanks! This is a big confusion here, that `[i, drop = TRUE]` [does drop factor levels](http://quantitative-ecology.blogspot.cz/2008/02/drop-unused-factor-levels.html) and `[i, j, drop = TRUE]` does not! – Tomas Jan 02 '13 at 14:50
  • @Tomas: yes, the choice of the name "drop" is probably not a really good idea... They could have used "simplify" as in `lapply/tapply()` functions, that is way clearer IMO... – digEmAll Jan 02 '13 at 14:54
  • 2
    yeah, but then the `simplify` argument is `simplify` in some places and `SIMPLIFY` in others (`mapply`, I think?) and the default is `TRUE` in some places and `FALSE` elsewhere ... sigh. – Ben Bolker Jan 02 '13 at 14:59
6

This is an stumbling block for many people, because "drop does something different", as Peter Dalgaard explains in http://www.mail-archive.com/r-help@stat.math.ethz.ch/msg22459.html and digEmAll below.

If you want what you want use:

d2[] <- lapply(d2, function(x) if (is.factor(x)) factor(x) else x) 
Dieter Menne
  • 10,076
  • 44
  • 67
  • 2
    +1 for the link to an answer from an R-core member ... why not just `d2 <- droplevels(d2)` ... ? Does your solution do something different/better? (I see that solution was suggested by Peter Dalgaard, but that was before `droplevels` was added to base R (in 2.13, I think?) – Ben Bolker Jan 02 '13 at 14:57
  • 1
    Correct, that was before ´droplevels´. I still find it useful because I see what happens. I learned about the d[]<- syntax from it. And old habits die hard. – Dieter Menne Jan 02 '13 at 15:01
6

What documentation says is

If TRUE the result is coerced to the lowest possible dimension.

So it is related to dimension, not to factor levels:

df[, 1]
# [1] europe  asia    oceania
# Levels: asia europe oceania
df[, 1, drop = FALSE]
#         a
# 1  europe
# 2    asia
# 3 oceania

Dropping factor levels is a different problem. Here is a case (?'[.factor') where argument drop appears for this purpose:

ff <- factor(c('AA', 'BA', 'CA'))
ff[1:2, drop = TRUE]
# [1] AA BA
# Levels: AA BA
Julius Vainora
  • 47,421
  • 9
  • 90
  • 102
  • 2
    Thanks! This is a big confusion here, that `[i, drop = TRUE]` [does drop factor levels](http://quantitative-ecology.blogspot.cz/2008/02/drop-unused-factor-levels.html) and `[i, j, drop = TRUE]` does not! – Tomas Jan 02 '13 at 14:50
  • 1
    It drops it if class is `factor`, but not a `data.frame`. Seems very straightforward to me. – Roman Luštrik Jan 02 '13 at 15:30
1

df[1:2,]$a[,drop=TRUE]

[1] europe asia
Levels: asia europe

Yo Man so what happens when u try by your method is that it applies the drop on the original data frame so the result u want does not come ok

So what u need to do is apply the drop on the subset returned ok .....

If any queries then feel free to ping me ....