0

I have two dates columns in a data.frame called faults. It has plenty of other columns as well. The idea is to extract records where the second columns date is between 10 days of first column as well as the starting point should be on the 3rd day... I want to extract those columns where date2 is between 10 days of date 1 but starts at 3rd day of date 1.

This is what I did...

for (i in 1:length(faults$PERIOD_START)){

  if (faults$FAULT_RECEIVED_DATE_FIRST[i] > faults$PERIOD_START[i])
  {
    if(faults$FAULT_RECEIVED_DATE_FIRST[i] == faults$PERIOD_START[i]+i){
      brat_none_set_b4_7d_view_flt_rec[i] = faults[i]
    }
  }
}

Obviously this doesn't extract the data between 3-10 days...

An example date is:

faults$FAULT_RECEIVED_DATE_FIRST = 
   "2013-12-01" , "2013-12-01", "2013-12-01" "2013-12-02", "2013-12-03", 

faults$PERIOD_START = 
   "2013-11-01", "2013-11-25", "2013-11-24", "2014-11-23", "2013-11-20"

expected records that should be extracted are:

at index of 2013-11-25, 2013-11-24, 2013-11-23 (because it is between 10 days and 3rd day of receiving the fault is at 2013-11-27)

any idea how to achieve this guys?

Regards,

Rich Scriven
  • 97,041
  • 11
  • 181
  • 245
Shery
  • 1,808
  • 5
  • 27
  • 51
  • Take a look at this first: http://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Jonas Tundo Apr 02 '14 at 12:30

2 Answers2

0

You can try:

x <- which((faults$FAULT_RECEIVED_DATE_FIRST - faults$PERIOD_START) >= 3 &
           (faults$FAULT_RECEIVED_DATE_FIRST - faults$PERIOD_START) <= 10)
faults[x]
Alnair
  • 938
  • 8
  • 10
0
faults <- as.data.frame(matrix (nrow = 5,ncol = 2))
colnames (faults) <- c ("PERIOD_START", "FAULT_RECEIVED_DATE_FIRST")

faults$FAULT_RECEIVED_DATE_FIRST  <-  c("2013-12-01" , "2013-12-01", 
                                        "2013-12-01", "2013-12-02", "2013-12-03")
faults$PERIOD_START  <-  c ("2013-11-01", "2013-11-25", "2013-11-24", 
                            "2013-11-23", "2013-11-20")

To convert your character vectors to Date:

faults$FAULT_RECEIVED_DATE_FIRST <- as.Date (faults$FAULT_RECEIVED_DATE_FIRST, 
                                             format = "%Y-%m-%d")
faults$PERIOD_START <- as.Date (faults$PERIOD_START, format = "%Y-%m-%d")

Than you just rest the dates to get the time difference:

faults ["diff"] <- faults ["FAULT_RECEIVED_DATE_FIRST"] - faults ["PERIOD_START"]

And transform it to numeric:

faults ["diff_days"] <- as.numeric(faults [["diff"]])

So you can subset data with the entries you need:

faults [faults$diff_days >= 3 & faults$diff_days =< 10,]
Andriy T.
  • 2,020
  • 12
  • 23