2

I have a dataframe which contains a variable called DateTime with data about date and time. Below I show an example:

df<- data.frame(DateTime=c("2016-08-23 00:22:23","2016-08-23 00:26:38","2016-08-23 01:04:12","2016-08-23 02:27:58","2016-08-23 03:04:31","2016-08-23 04:51:46"))
df$DateTime<- as.POSIXct(df$DateTime, format="%Y-%m-%d %H:%M:%S", tz="UTC")

df
             DateTime
1 2016-08-23 00:22:23
2 2016-08-23 00:26:38
3 2016-08-23 01:04:12
4 2016-08-23 02:27:58
5 2016-08-23 03:04:31
6 2016-08-23 04:51:46

I want to create a variable called DateTime45 that rounds up those to dates and times at 45-minutes intervals. Below I show what I tried so far:

df$DateTime45<- round_date(df$DateTime, "45 mins")

df

             DateTime          DateTime45
1 2016-08-23 00:22:23 2016-08-23 00:00:00
2 2016-08-23 00:26:38 2016-08-23 00:45:00
3 2016-08-23 01:04:12 2016-08-23 01:00:00
4 2016-08-23 02:27:58 2016-08-23 02:45:00
5 2016-08-23 03:04:31 2016-08-23 03:00:00
6 2016-08-23 04:51:46 2016-08-23 04:45:00

However, as you can see, it creates something strange since time intervals are not evenly distributed. I would like to get this instead:

df
             DateTime          DateTime45
1 2016-08-23 00:22:23 2016-08-23 00:00:00
2 2016-08-23 00:26:38 2016-08-23 00:45:00
3 2016-08-23 01:04:12 2016-08-23 00:45:00
4 2016-08-23 02:27:58 2016-08-23 02:15:00
5 2016-08-23 03:04:31 2016-08-23 03:00:00
6 2016-08-23 04:51:46 2016-08-23 04:30:00

The limits of a 45-time-intervals would be the following if we consider the time in 24-hour format:

TimeIntervalLimits<- seq.POSIXt(as.POSIXct("2016-08-23 00:00:00"), as.POSIXct("2016-08-24 00:45:00"), by = "45 min", format="%Y-%m-%d %H-%M-%S", tz="UTC")
TimeIntervalLimits<- as.data.frame(TimeIntervalLimits)

TimeIntervalLimits

    TimeIntervalLimits
1  2016-08-23 00:00:00
2  2016-08-23 00:45:00
3  2016-08-23 01:30:00
4  2016-08-23 02:15:00
5  2016-08-23 03:00:00
6  2016-08-23 03:45:00
7  2016-08-23 04:30:00
8  2016-08-23 05:15:00
9  2016-08-23 06:00:00
10 2016-08-23 06:45:00
11 2016-08-23 07:30:00
12 2016-08-23 08:15:00
13 2016-08-23 09:00:00
14 2016-08-23 09:45:00
15 2016-08-23 10:30:00
16 2016-08-23 11:15:00
17 2016-08-23 12:00:00
18 2016-08-23 12:45:00
19 2016-08-23 13:30:00
20 2016-08-23 14:15:00
21 2016-08-23 15:00:00
22 2016-08-23 15:45:00
23 2016-08-23 16:30:00
24 2016-08-23 17:15:00
25 2016-08-23 18:00:00
26 2016-08-23 18:45:00
27 2016-08-23 19:30:00
28 2016-08-23 20:15:00
29 2016-08-23 21:00:00
30 2016-08-23 21:45:00
31 2016-08-23 22:30:00
32 2016-08-23 23:15:00
33 2016-08-24 00:00:00
 .      .         .
 .      .         .

Does anyone know how to get the variable DateTime45 in the way I want it?

Thanks in advance

smci
  • 32,567
  • 20
  • 113
  • 146
Dekike
  • 1,264
  • 6
  • 17

1 Answers1

4

EDIT

I misunderstood the question earlier. The desired output for updated data can be achieved by using some mathematical manipulation since datetime can be converted to numeric.

df$DateTime45 <- as.POSIXct(round(as.numeric(df$DateTime)/(45*60))*
                           (45*60),origin='1970-01-01', tz = 'UTC')

df
#             DateTime          DateTime45
#1 2016-08-23 00:22:23 2016-08-23 00:00:00
#2 2016-08-23 00:26:38 2016-08-23 00:45:00
#3 2016-08-23 01:04:12 2016-08-23 00:45:00
#4 2016-08-23 02:27:58 2016-08-23 02:15:00
#5 2016-08-23 03:04:31 2016-08-23 03:00:00
#6 2016-08-23 04:51:46 2016-08-23 04:30:00

Original Answer

In base R, one way would be to create a 45-minute interval and use cut/findInterval.

TimeIntervalLimits <- seq(as.POSIXct("2016-08-23 00:00:00", tz = 'UTC'), 
                    as.POSIXct("2016-08-24 00:45:00", tz = 'UTC'), by = "45 min")
df$DateTime45 <- cut(df$DateTime, TimeIntervalLimits)
#Or with `findInterval`
#df$DateTime45 <- TimeIntervalLimits[findInterval(df$DateTime, TimeIntervalLimits)]

df
#             DateTime          DateTime45
#1 2016-08-23 00:22:23 2016-08-23 00:00:00
#2 2016-08-23 01:04:12 2016-08-23 00:45:00
#3 2016-08-23 02:27:58 2016-08-23 02:15:00
#4 2016-08-23 03:04:31 2016-08-23 03:00:00
#5 2016-08-23 04:51:46 2016-08-23 04:30:00

As explained in the comments, cut takes breaks starting from minimum value in the vector. So one hack is to insert a fake timestamp in the vector from where we want to start the breaks and then use cut with breaks argument. This avoids creating TimeIntervalLimits vector.

df$DateTime45 <- cut(c(as.POSIXct('2016-08-23 00:00:00', tz = 'UTC'), 
                     df$DateTime), '45 mins')[-1]
Ronak Shah
  • 377,200
  • 20
  • 156
  • 213
  • Thanks @Ronak Shah. So then, as far as you know, there is no a direct way to round `DateTime` to get the variable `DateTime45`, isn't there? The easiest way is to create a vector with the desired `TimeIntervalLimits` and then merge both data.frames. Is that correct? – Dekike Dec 30 '19 at 08:04
  • 1
    @Dekike as far as I know, yes because all of these methods take breaks starting from minimum value in `DateTime` which is `2016-08-23 00:22:23` and not `2016-08-23 00:00:00` as you want. `cut(df$DateTime, "45 mins")` works the same way. – Ronak Shah Dec 30 '19 at 08:07
  • 1
    @Dekike although I am not sure what is happening with `round_date`. I would expect it to work as per your expectation. Don't quite understand why it rounds to `2016-08-23 01:00:00` for `2016-08-23 01:04:12` – Ronak Shah Dec 30 '19 at 08:16
  • Thanks @Ronak Shah. Can I ask you something? In the post question I included the word `Round` meaning that I wanted that for instance, `2016-08-23 00:22:23` was classified as `2016-08-23 00:00:00` but `2016-08-23 00:28:17` (not included in my example) was classified as `2016-08-23 00:45:00`. I mean, I wanted that each `DateTime` was converted to `DateTime45` but taking into account which `TimeIntervalLimit` is the closest. Now I realized that with my example I can't check that your code is also doing that, but when I have used it with my real data I realized it. Can you tell me how to do it? – Dekike Dec 30 '19 at 08:47
  • * I included an extra line in my `df` (`2016-08-23 00:26:38`) to check that the code works properly rounding `DateTime` also to the closest `TimeIntervalLimit`. – Dekike Dec 30 '19 at 08:55
  • @Dekike: you could also create the vector then do binary-search to nearest: [Find closest value in a vector with binary search](https://stackoverflow.com/questions/20133344/find-closest-value-in-a-vector-with-binary-search), but that's more work – smci Dec 30 '19 at 09:27
  • @Dekike yes, I misunderstood the question. I have updated the answer, it should help now. – Ronak Shah Dec 30 '19 at 10:17