12

Possible Duplicate:
dropping factor levels in a subsetted data frame in R

I have subsetted away observations with a certain factor level. When checking whether this has been done with summary() the levels were still listed, but with zero observations. Shouldn't they disappear during the subsetting?

divibisan
  • 11,659
  • 11
  • 40
  • 58
ego_
  • 1,409
  • 6
  • 21
  • 31

2 Answers2

16

Subsetting doesn't drop empty levels. Why this is the case is that it is a feature. Think of it as your factor levels determine the possible/potential categories of a thing. If you only take a subset of these things, the possible categories of thing don't change, your subset just doesn't contain any of them.

If you want to drop these empty levels, see ?droplevels.

Gavin Simpson
  • 170,508
  • 25
  • 396
  • 453
  • 1
    The only danger of `droplevels` applied to a data frame is that by default it will drop empty levels for **all** factors (rather than just for levels of the focal factor), which might be undesired. – Ben Bolker Sep 20 '12 at 20:39
  • 2
    Right, for the single factor I'd do `obj <- transform(obj, fac = droplevels(fac))` if I wanted to leave other factors untouched. – Gavin Simpson Sep 20 '12 at 20:41
9

To make the extra levels disappear, use drop=TRUE when subsetting:

newfactor <- oldfactor[indices, drop=TRUE]

Incidentally, one reason this is not the default is that factors with different levels cannot be compared. So if you want to compare your factors with the original vector, or perhaps a different subset of the vector, you'd need to keep the extra levels.

David Robinson
  • 77,383
  • 16
  • 167
  • 187