141

I am working with the dataset LearnBayes. For those that want to see the actual data:

install.packages('LearnBayes')

I am trying to filter out rows based on the value in the columns. For example, if the column value is "water", then I want that row. If the column value is "milk", then I don't want it. Ultimately, I am trying to filter out all individuals who's Drink column is "water".

om-nom-nom
  • 62,329
  • 13
  • 183
  • 228
user722224
  • 1,501
  • 2
  • 14
  • 9

3 Answers3

283

The subset command is not necessary. Just use data frame indexing

studentdata[studentdata$Drink == 'water',]

Read the warning from ?subset

This is a convenience function intended for use interactively. For programming it is better to use the standard subsetting functions like ‘[’, and in particular the non-standard evaluation of argument ‘subset’ can have unanticipated consequences.

adamleerich
  • 5,741
  • 2
  • 18
  • 20
  • 9
    Thanks @adamleerich. Out of curiosity, what's the reasoning behind the comma? – ThinkBonobo Oct 24 '15 at 18:15
  • 9
    The `[ ]` syntax indexes into the 2-dimensional data frame in the normal way that matrices are indexed in math: row and then column, separated by a comma. In this case, we're passing a vector of row indices (`studentData$Drink == 'water'` picks out the rows we're interested in), but since we don't want to restrict which columns we get for those rows (we want all of them), we leave the column part of the index pair blank (so there's nothing after the comma). This is syntactic sugar to avoid having to give a vector of all column indices. – Will Jan 08 '16 at 17:26
  • 1
    what happens when I need to use < or > sembols? – can.u Jan 22 '16 at 19:54
  • 3
    This does not filter out the data, it merely replaces rows that do not meet the criteria with NA. Another operation is required to remove these rows – Johnny V Dec 13 '17 at 10:28
90

Try this:

subset(studentdata, Drink=='water')

that should do it.

Dave Kincaid
  • 3,970
  • 3
  • 24
  • 32
  • Thank you! I tried some variation of that but must have been off on the punctuation or something silly like that. I appreciate the help. – user722224 Sep 12 '11 at 01:32
52

Thought I'd update this with a dplyr solution

library(dplyr)    
filter(studentdata, Drink == "water")
rrs
  • 9,615
  • 4
  • 28
  • 38