1

I am trying to subtract one hour to date/times within a POSIXct column that are earlier than or equal to a time stated in a different comparison dataframe for that particular ID.

For example:

#create sample data
Time<-as.POSIXct(c("2015-10-02 08:00:00","2015-11-02 11:00:00","2015-10-11 10:00:00","2015-11-11 09:00:00","2015-10-24 08:00:00","2015-10-27 08:00:00"), format = "%Y-%m-%d %H:%M:%S")
ID<-c(01,01,02,02,03,03)
data<-data.frame(Time,ID)

Which produces this:

                 Time ID
1 2015-10-02 08:00:00  1
2 2015-11-02 11:00:00  1
3 2015-10-11 10:00:00  2
4 2015-11-11 09:00:00  2
5 2015-10-24 08:00:00  3
6 2015-10-27 08:00:00  3

I then have another dataframe with a key date and time for each ID to compare against. The Time in data should be compared against Comparison in ComparisonData for the particular ID it is associated with. If the Time value in data is earlier than or equal to the comparison value one hour should be subtracted from the value in data:

#create sample comparison data
Comparison<-as.POSIXct(c("2015-10-29 08:00:00","2015-11-02 08:00:00","2015-10-26 08:30:00"), format = "%Y-%m-%d %H:%M:%S")
ID<-c(01,02,03)
ComparisonData<-data.frame(Comparison,ID)

This should look like this:

           Comparison  ID
1 2015-10-29 08:00:00   1
2 2015-11-02 08:00:00   2
3 2015-10-26 08:30:00   3

In summary, the code should check all times of a certain ID to see if any are earlier than or equal to the value specified in ComparisonData and if they are, subtract one hour. This should give this data frame as an output:

                 Time  ID
1 2015-10-02 07:00:00   1
2 2015-11-02 11:00:00   1
3 2015-10-11 09:00:00   2
4 2015-11-11 09:00:00   2
5 2015-10-24 07:00:00   3
6 2015-10-27 08:00:00   3

I have looked at similar solutions such as this but I cannot understand how to also check the times using the right timing with that particular ID.

I think ddply seems quite a promising option but I'm not sure how to use it for this particular problem.

Community
  • 1
  • 1
Christine
  • 31
  • 7

2 Answers2

2

Here's a quick and efficient solution using data.table. First we join the two data sets by ID and then just modify the Times which are lower or equal to Comparison

library(data.table) # v1.9.6+
setDT(data)[ComparisonData, end := i.Comparison, on = "ID"]
data[Time <= end, Time := Time - 3600L][, end := NULL]
data
#                   Time ID
# 1: 2015-10-02 07:00:00  1
# 2: 2015-11-02 11:00:00  1
# 3: 2015-10-11 09:00:00  2
# 4: 2015-11-11 09:00:00  2
# 5: 2015-10-24 07:00:00  3
# 6: 2015-10-27 08:00:00  3

Alternatively, we could do this in one step while joining using ifelse (not sure how efficient this though)

setDT(data)[ComparisonData, 
            Time := ifelse(Time <= i.Comparison, 
                           Time - 3600L, Time), 
            on = "ID"]
data
#                   Time ID
# 1: 2015-10-02 07:00:00  1
# 2: 2015-11-02 11:00:00  1
# 3: 2015-10-11 09:00:00  2
# 4: 2015-11-11 09:00:00  2
# 5: 2015-10-24 07:00:00  3
# 6: 2015-10-27 08:00:00  3
David Arenburg
  • 91,361
  • 17
  • 137
  • 196
  • Would you be able to explain your first solution a bit more? Especially where end and i.Comparison comes from (what does the i mean in this case?). I am also unsure of what is happening in the 3rd line with having two sets of square brackets? If this is just too basic can you direct me to somewhere that might explain it? – Christine Nov 12 '15 at 17:02
  • `i` means to get the `Comparison` column out of the `ComparisonData` data which is located in the `ith` argument of `data`. – David Arenburg Nov 12 '15 at 17:03
  • See [here](https://github.com/Rdatatable/data.table/wiki/Getting-started) if you want to get familiar with `data.table` a bit more. – David Arenburg Nov 12 '15 at 18:21
0

I am sure there is going to be a better solution than this, however, I think this works.

for(i in 1:nrow(data)) {
  if(data$Time[i] < ComparisonData[data$ID[i], 1]){
   data$Time[i] <- data$Time[i] - 3600
  }
}



#          Time ID
#1 2015-10-02 07:00:00  1
#2 2015-11-02 11:00:00  1
#3 2015-10-11 09:00:00  2
#4 2015-11-11 09:00:00  2
#5 2015-10-24 07:00:00  3
#6 2015-10-27 08:00:00  3

This is going to iterate through every row in data.

ComparisonData[data$ID[i], 1] gets the time column in ComparisonData for the corresponding ID. If this is greater than the Time column in data then reduce the time by 1 hour.

Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Could you explain a bit more to how the if statement works as I am a bit confused. So in the first loop is will be: if 2015-10-02 08:00:00 < ??? how does it use ID? – Christine Nov 12 '15 at 16:32
  • @Christine have added an explanation. Let me know if something is not clear. – Ronak Shah Nov 12 '15 at 16:43