Read rows with specific column values

Question

I want to extract a set of rows of an existing dataset:

 dataset.x <- dataset[(as.character(dataset$type))=="x",]

however when I run

   summary(dataset.x$type)

It displays all types which were present in the original dataset. Basically I get a result that says

   x 12354235    #the correct itemcount
   y 0
   z 0
   a 0
   ...

Not only is the presence of 0 elements ugly but it also messes up any plot of dataset.x due to the presence of hundrets of entries with the value 0.

Care to provide a reproducible example to avoid guessing from out part? — Roman Luštrik, Jun 16 '11 at 18:52

score 3 · Answer 1 · answered Jun 16 '11 at 18:51

3

I'm assuming this is a factor? If so, droplevels() can be used: http://stat.ethz.ch/R-manual/R-patched/library/base/html/droplevels.html

If you add a small reproducible example, it will help others get on the same page and give better advice if this isn't right.

answered Jun 16 '11 at 18:51

Chase

67,710
18
144
161

You don't need `gdata` anymore, I think. `droplevels` was added recently, not sure which version. – joran Jun 16 '11 at 18:53

score 3 · Accepted Answer · answered Jun 16 '11 at 19:05

3

Building on Chase's answer, subsetting and dropping unused levels in factors comes up a lot, so it pays to just create your own function by combining droplevels and subset:

subsetDrop <- function(...){droplevels(subset(...))}

answered Jun 16 '11 at 19:05

joran

169,992
32
429
468

If you're using that function regularly, it's probably a sign you want a character vector, not a factor. – hadley Jun 18 '11 at 04:02
@hadley - Indeed, I live mostly with stringsAsFactors=FALSE. However, I happen to often want things ordered non-alphabetically when I plot them without dragging all the levels along for the ride. – joran Jun 18 '11 at 04:29
1

I wish there was a datatype that preserved order but didn't preserve levels. – hadley Jun 18 '11 at 12:54

score 3 · Answer 3 · answered Jun 16 '11 at 21:23

Others have explained what is happening and how to fix it, I just want to show why it is a desirable default.

Consider the following sample code:

mydata <- data.frame( 
    x = factor( rep( c(0:5,0:5), c(0,5,10,20,10,5,5,10,20,10,5,0))),
    sex = rep( c('F','M'), each=50 ) )

mydata.males <- mydata[ mydata$sex=='M', ]
mydata.males.dropped <- droplevels(mydata.males)

mydata.females <- mydata[ mydata$sex=='F', ]
mydata.females.dropped <- droplevels(mydata.females)

par(mfcol=c(2,2))
barplot(table(mydata.males$x), main='Male', sub='Default')
barplot(table(mydata.females$x), main='Female', sub='Default')

barplot(table(mydata.males.dropped$x), main='Male', sub='Drop')
barplot(table(mydata.females.dropped$x), main='Female', sub='Drop')

Which produces this plot:

enter image description here

Now, which is the more meaningful comparison, the 2 plots on the left? or the 2 plots on the right?

Instead of dropping unused levels it may be better to rethink what you are doing. If the main goal is to get the count of the x's then you can use sum rather than subsetting and getting the summary. And how meaningful can a plot be on a variable that you have already forced to be a single value?

score 1 · Answer 4 · answered Jun 16 '11 at 18:52

1

Try

dataset$type <- as.character(dataset$type)

followed by your original code. It's probably just that R is still treating that column as a factor and is keeping all of the information about that factor in the column.

answered Jun 16 '11 at 18:52

Rguy

1,622
1
15
20

Read rows with specific column values

4 Answers4

Linked