0

I'm trying to subset a data frame based on a factor. However even after subsetting R shows additional factors.

For instance in the iris dataset included in R I want to create a subset that only contains the Setosa species. However even after subsetting R shows that there are 3 factors when browsing through the data only shows Setosa. Why is this?

Thanks in advance

#Load Data
library(datasets)
data(iris)

#Subset specie into new data frame only containing Setosa oberservations
sub = iris[iris$Species == "setosa",]

#View sub data frame. Why are there still three levels?
str(sub)

'data.frame':   50 obs. of  5 variables:
 $ Sepal.Length: num  5.1 4.9 4.7 4.6 5 5.4 4.6 5 4.4 4.9 ...
 $ Sepal.Width : num  3.5 3 3.2 3.1 3.6 3.9 3.4 3.4 2.9 3.1 ...
 $ Petal.Length: num  1.4 1.4 1.3 1.5 1.4 1.7 1.4 1.5 1.4 1.5 ...
 $ Petal.Width : num  0.2 0.2 0.2 0.2 0.2 0.4 0.3 0.2 0.2 0.1 ...
 $ Species     : Factor w/ 3 levels "setosa","versicolor",..: 1 1 1 1 1 1 1 1 1 1 ...
canyon289
  • 3,355
  • 4
  • 33
  • 41
  • Would it not be easier to use `sub <- subset(iris, iris$Species == "setosa")`? – ccapizzano May 29 '14 at 19:10
  • I suppose it could be. I'm new to R so I'm not familiar with all the various functions. Thanks for the tip – canyon289 May 29 '14 at 19:11
  • You haven't actually created new data, so it still maintains the properties of `iris`, because it's simply a small chunk from a larger data structure. In `sub` you can do `sub$Species <- as.character(sub$Species)` to change it to character. – Rich Scriven May 29 '14 at 19:12
  • See also [this](http://stackoverflow.com/q/3445316/324364) question; it was a tossup for me which this should be a duplicate of. – joran May 29 '14 at 19:13
  • Re my comment: After realizing `droplevels()` is available, that would be the better function in this case. @joran, nice finds. Especially the one in the comment. Very useful information. – Rich Scriven May 29 '14 at 19:18

1 Answers1

0

Because R keeps the factor levels. You can get rid off them with the droplevels function.

Jaap
  • 81,064
  • 34
  • 182
  • 193