0

I am fairly new to R and am having some trouble with doing calculations and comparisons with dates in R. Basically, I have to data frames:

df1 <- (2921 rows)

    DateTime
1   2013-06-01 00:00:00
2   2013-06-01 03:00:00
3   2013-06-01 06:00:00
4   2013-06-01 09:00:00
5   2013-06-01 12:00:00
6   2013-06-01 15:00:00
7   2013-06-01 18:00:00
8   2013-06-01 21:00:00
9   2013-06-02 00:00:00
10  2013-06-02 03:00:00 

and df2 <- (70,816 rows)

Create.Date.Time        Service         Closing.Date.Time
1   2013-06-01 12:59:00 AV              2013-06-01 13:59:00
2   2013-06-02 07:56:00 SERVICE684793   2013-06-02 08:59:00
3   2013-06-02 09:39:00 SERVICE684793   2013-06-03 12:01:00
4   2013-06-02 14:14:00 SERVICE684796   2013-06-02 14:55:00
5   2013-06-02 17:20:00 SERVICE684797   2013-06-03 12:06:00
6   2013-06-03 07:20:00 SERVICE684793   2013-06-03 07:39:00
7   2013-06-03 08:02:00 SERVICE684839   2013-06-03 12:09:00
8   2013-06-03 08:04:00 SERVICE684841   2013-06-04 08:05:00
9   2013-06-03 08:04:00 SERVICE684841   2013-06-05 08:06:00
10  2013-06-03 08:08:00 SERVICE684841   2013-06-03 08:08:00

My task is to get the cumulative count of df2$Create.Date.time for each i in df1$DateTime. In other words, I am looking to count how many instances of df2$Create.Date.Time being less than or equal to each df1$DateTime exist.

For example, for df1$DateTime = 2013-06-02 18:00:00, the cummulative count for df2$Create.Date.Time would be 5 (There are 5 instances where the Create.Date.Time is earlier than 2013-06-02 18:00:00 in df$2).

I also need to do the same thing per service.

I have tried converting the dates (all of which are of class "POSIXct" "POSIXt") to seconds then doing the comparison but i keep getting weird errors. I would appreciate any help.

NarT
  • 33
  • 1
  • 1
  • 4
  • Can you show the expected output. It is better to use `dput` to show the data. For example `dput(head(data,10))` – akrun Aug 25 '14 at 11:33
  • BTW there is no date `2013-06-02 18:00:00` in df1$DateTime – akrun Aug 25 '14 at 11:38
  • Does it have to be grouped by `Service`? – akrun Aug 25 '14 at 11:48
  • @akrun, df1$DateTime continues until 2014-06-01 00.00.00, so that is why i took the example of 2013-06-02 18:00:00. I am not sure as to how to present the expected output, as for the piece of data frame i posted the result will be trivial, with the cumulative count being 0 for rows 1 to 5 of df2 and 1 for rows 6 to 10. – NarT Aug 25 '14 at 11:50
  • Ideally, I do want it to be grouped by service, but since I am having trouble with the whole process, I thought it would be simpler for me to understand it as a whole, and then try to group it by service using ddply, I am not sure if that makes sense. – NarT Aug 25 '14 at 11:52
  • Looks like you are also not sure about the whole thing. Once you figured it out, please post. – akrun Aug 25 '14 at 12:22

1 Answers1

0

Try:

   library(lubridate)

    df1New <- within(df1, {
        Createtime <- period_to_seconds(hms(strftime(DateTime, "%H:%M:%S")))
       Date <- as.Date(DateTime)     
      })

    df2New <- within(df2, {
        Createtime1 <- period_to_seconds(hms(strftime(Create.Date.Time, "%H:%M:%S")))
        Date <- as.Date(Create.Date.Time)
     }) 


   df1New$Num.Closed <- unsplit(lapply(split(df1New, df1New$Date), function(x) {
    x2 <- df2New[df2New$Date %in% x$Date, ]
    unlist(lapply(1:nrow(x), function(i) {
    x1 <- x[i, ]
    sum(x2$Createtime1 <= x1$Createtime)
    }))
   }), df1New$Date)

   df1New[,-(2:3)]
   #             DateTime Num.Closed
  #1  2013-06-01 00:00:00          0
  #2  2013-06-01 03:00:00          0
  #3  2013-06-01 06:00:00          0
  #4  2013-06-01 09:00:00          0
  #5  2013-06-01 12:00:00          0
  #6  2013-06-01 15:00:00          1
  #7  2013-06-01 18:00:00          1
  #8  2013-06-01 21:00:00          1
  #9  2013-06-02 00:00:00          0
  #10 2013-06-02 03:00:00          0

data

  df1 <- structure(list(DateTime = c("2013-06-01 00:00:00", "2013-06-01 03:00:00", 
  "2013-06-01 06:00:00", "2013-06-01 09:00:00", "2013-06-01 12:00:00", 
  "2013-06-01 15:00:00", "2013-06-01 18:00:00", "2013-06-01 21:00:00", 
  "2013-06-02 00:00:00", "2013-06-02 03:00:00")), .Names = "DateTime", class = "data.frame", row.names = c("1", 
  "2", "3", "4", "5", "6", "7", "8", "9", "10"))

  df2 <- structure(list(Create.Date.Time = c("2013-06-01 12:59:00", "2013-06-02   07:56:00", 
  "2013-06-02 09:39:00", "2013-06-02 14:14:00", "2013-06-02 17:20:00", 
  "2013-06-03 07:20:00", "2013-06-03 08:02:00", "2013-06-03 08:04:00", 
  "2013-06-03 08:04:00", "2013-06-03 08:08:00"), Service = c("AV", 
  "SERVICE684793", "SERVICE684793", "SERVICE684796", "SERVICE684797", 
  "SERVICE684793", "SERVICE684839", "SERVICE684841", "SERVICE684841", 
  "SERVICE684841"), Closing.Date.Time = c("2013-06-01 13:59:00", 
  "2013-06-02 08:59:00", "2013-06-03 12:01:00", "2013-06-02 14:55:00", 
  "2013-06-03 12:06:00", "2013-06-03 07:39:00", "2013-06-03 12:09:00", 
  "2013-06-04 08:05:00", "2013-06-05 08:06:00", "2013-06-03 08:08:00"
  )), .Names = c("Create.Date.Time", "Service", "Closing.Date.Time"
  ), class = "data.frame", row.names = c("1", "2", "3", "4", "5", 
  "6", "7", "8", "9", "10"))
akrun
  • 874,273
  • 37
  • 540
  • 662
  • Thank you very much for your help @akrun. I do have one question, your code rests for every day so that the cumulative sum starts at zero every day. Is there any way to get the cumulative sum to continue over the next day(s)? and yes, you were right before, I do want to do it by service. I will try to tweak your code a bit to do that although it looks hard... – NarT Aug 25 '14 at 22:59
  • @user3770767 As I mentioned earlier, please show your correct expected output. It would be always easier to have a reference to look up to. – akrun Aug 26 '14 at 03:47
  • I am so sorry to have not replied earlier, but i think i managed to get the output i wanted using adply and subset. As I am a bit pressed for time, I will provide a sample data frame and output just in case someone struggles with a similar issue. – NarT Aug 27 '14 at 22:45
  • I have modified my question completely, added the data using dput. The question can be found here: http://stackoverflow.com/questions/25538719/subsetting-a-data-frame-by-ddply-then-applying-a-function-with-adply-on-the-sub/25539258#25539258 . I do realise that I am not very good at formulating my questions, I am not at ease at all with writing codes. I would be really really grateful if you had any spare time to have a look at it. Thanks – NarT Aug 29 '14 at 11:21
  • @NarT I will take a look at your dataset when I get the time. – akrun Aug 29 '14 at 13:08