89

I am using R and need to select rows with aged (age of death) less than or equal to laclen (lactation length). I am trying to create a new data frame to only include rows/ids whereby the value of column'aged' is less than its corresponding 'laclength' value.

df:
 id1   id2    laclen    aged
9830  64526    26       6 
7609  64547    28       0 
9925  64551     3       0 
9922  64551     3       5 
9916  64551     3       8 
9917  64551     3       8 
9914  64551     3       2 

the new data frame should look like this:

dfnew:
id1   id2    laclen    aged
9830  64526    26       6 
7609  64547    28       0 
9925  64551     3       0 
9914  64551     3       2

Any help would be appreciated!

Bazon

Roshin Raphel
  • 2,612
  • 4
  • 22
  • 40
Bazon
  • 1,233
  • 2
  • 10
  • 7
  • Please give more details about your need. – Karthik May 18 '10 at 04:52
  • 1
    Hi Karthik, I am trying to create a new data frame to only include rows/ids whereby the value of column'aged' is less than the value of column 'laclength' – Bazon May 18 '10 at 05:34

3 Answers3

134
df[df$aged <= df$laclen, ] 

Should do the trick. The square brackets allow you to index based on a logical expression.

wkmor1
  • 7,226
  • 3
  • 31
  • 23
  • thanks, aL3xa! I will keep this one as well. I can see that its very similar to one wkmor1 sent earlier. – Bazon May 18 '10 at 06:02
  • 1
    @aL3xa `attach` without `detach` could be dangerous... And I think that comma is misplaced. – Marek May 18 '10 at 07:39
  • 1
    @Marek, thanks for suggestions! I've added `detach` and placed comma after the right bracket, so it goes like this: `attach(df); newdf <- df[which(aged <= laclen), ]; detach(df)` – aL3xa May 18 '10 at 10:49
  • 1
    @aL3xa You could also use `with` - `newdf <- df[with(df,which(aged <= laclen)), ]` instead of `attach/detach`. – Marek May 18 '10 at 11:37
  • 1
    I get the error:`Error in Ops.factor(value, productcode) : level sets of factors are different`, had to set the levels on those fields: https://stackoverflow.com/questions/24594981/getting-the-error-level-sets-of-factors-are-different-when-running-a-for-loop – fraxture Mar 02 '16 at 12:02
  • This solution returns NA lines when the criteria are not met, the @Jonathan Chang answer worked best for me – Mayeul sgc Mar 15 '19 at 15:30
62

You can also do

subset(df, aged <= laclen)
Jonathan Chang
  • 24,567
  • 5
  • 34
  • 33
  • 1
    nice one, makes code neater according to me, pity `R CMD check` does not recognize the fields used in the test as legitimate variables. it emits a `NOTE` "no visible binding for global variable". – mariotomo May 18 '10 at 14:10
  • 3
    `subset()` has some serious problems, see e.g. http://stackoverflow.com/questions/9860090/in-r-why-is-better-than-subset – MERose Dec 29 '15 at 11:50
14

If you use dplyr package you can do:

library(dplyr)
filter(df, aged <= laclen)
Enrique Pérez Herrero
  • 3,699
  • 2
  • 32
  • 33