3

Suppose my data looks like this:

2372  Kansas KS2000111 HUMBOLDT, CITY OF    ATRAZINE    1.3 05/07/2006
9104  Kansas KS2000111 HUMBOLDT, CITY OF    ATRAZINE   0.34 07/23/2006
9212  Kansas KS2000111 HUMBOLDT, CITY OF    ATRAZINE   0.33 02/11/2007
2094  Kansas KS2000111 HUMBOLDT, CITY OF    ATRAZINE    1.4 05/06/2007
16763 Kansas KS2000111 HUMBOLDT, CITY OF    ATRAZINE   0.61 05/11/2009
1076  Kansas KS2000111 HUMBOLDT, CITY OF METOLACHLOR   0.48 05/12/2002
1077  Kansas KS2000111 HUMBOLDT, CITY OF METOLACHLOR    0.3 05/07/2006

I want to be able to subset by the Analyte and a partial match on the date(namely I just want the year). I have been trying this, but I know it isn't quite right.

 data[data$Analyte=="ATRAZINE" & grep("2006",as.character(data$Date)),]

Any suggestions?

pslice
  • 503
  • 1
  • 4
  • 13
  • 1
    Related questions: http://stackoverflow.com/questions/1536590/how-to-select-rows-from-data-frame-with-2-conditions and http://stackoverflow.com/questions/2844669/r-question-create-new-data-set-that-meets-all-of-4-conditions/2844687#2844687 – Shane Jun 16 '10 at 11:09

3 Answers3

3

For this problem I would go with the approach in Apprentice Queue's answer of extracting the year from the date rather than doing generic string matching. I would suggest:

data[data$Analyte =="ATRAZINE"
     & as.POSIXlt(data$Date, format="%m/%d/%Y")$year == 106]

But if you really had to do regexp matching, you could use grepl which returns a logical vector rather than grep which returns a vector of indices.

data[data$Analyte=="ATRAZINE" & grepl("2006",as.character(data$Date)),]
Jyotirmoy Bhattacharya
  • 9,317
  • 3
  • 29
  • 38
2

One way using date literals:

data[data$Analyte =="ATRAZINE"
     & (data$Date >= '2006-01-01' & data$Date < '2007-01-01')]

Another way using format

data[data$Analyte =="ATRAZINE"
     & format(data$Date, "%Y") == '2006']
Apprentice Queue
  • 2,036
  • 13
  • 13
  • 3
    With `subset` you could skip `$` references, e.g.: `subset(data, Analyte=="ATRAZINE" & format(Date, "%Y")=="2006")`. And in your first solution `as.Date` is needed. – Marek Jun 16 '10 at 08:11
  • 1
    as.Date isn't needed because R automatically converts it to Date. – Apprentice Queue Jun 17 '10 at 23:42
  • My mistake. I miss in which version of R that was changed. Once I had an error in R-2.2.0 and from that moment I always used `as.Date`. Time to rewrite all of my code :) – Marek Jun 22 '10 at 15:59
0

Realize this question has been asked quite some years back, hopefully should help some one in the future.

Used dplyr for sub-setting using multiple conditions, and checking the year after converting into Date type

library(dplyr)

data %>% filter( Analyte=="ATRAZINE" & format(as.Date(Date,format = "%m/%d/%Y"),"%Y") == "2006") 
Indi
  • 1,401
  • 13
  • 30