1

I am trying to calculate how many reports are running at a certain time.

The data is like:

ReportID    StartTime   Duration  
   1 2018-11-02 13:00:00  240 seconds  
   2 2018-11-02 14:00:00  300 seconds  
   3 2018-11-02 14:01:15  300 seconds  
   4 2018-11-02 14:00:00 5000 seconds

The ideal output will be:

Time #ReportsRunning
2018-11-01 13:00:00 0
2018-11-02 13:00:00 1  
2018-11-02 14:00:00 2
2018-11-02 15:00:00 1

Is there anyway to do something like this? I am thinking about adding column to every timestamp I want to check. But that will make the table extremely wide.


Data in reproducible form:

df1 <- data.frame(
  ReportID = 1:4,
  StartTime = as.POSIXct(c("2018-11-02 13:00:00", "2018-11-02 14:00:00",
                           "2018-11-02 14:01:15", "2018-11-02 14:00:00")),
  Duration = as.difftime(c(240, 300, 300, 5000), units = "secs")
)

df2 <- data.frame(
  Time = as.POSIXct(c("2018-11-01 13:00:00", "2018-11-02 13:00:00",
                      "2018-11-02 14:00:00", "2018-11-02 15:00:00"))
)
Scarabee
  • 5,437
  • 5
  • 29
  • 55
  • 2
    Welcome to StackOverflow! A little tip for next time: include your data in a reproducible form (as I did above), and you'll be more likely to get answers. Have a look here: https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example – Scarabee Nov 09 '18 at 13:34
  • Also, the number of reports running at 2pm is 2 and not 3, right? I edited. – Scarabee Nov 09 '18 at 13:35

1 Answers1

0

Here is a base R solution:

df2$`#ReportsRunning` <- sapply(
  df2$Time,
  function(x) sum(x >= df1$StartTime & x <= df1$StartTime + df1$Duration)
)

df2
#                  Time #ReportsRunning
# 1 2018-11-01 13:00:00               0
# 2 2018-11-02 13:00:00               1
# 3 2018-11-02 14:00:00               2
# 4 2018-11-02 15:00:00               1

But if your data is large, it should be much more efficient to use the IRanges package from BioConductor:

library(IRanges)

ranges <- IRanges(as.integer(df1$StartTime), width = as.integer(df1$Duration))
values <- as.integer(df2$Time)

df2$`#ReportsRunning` <- countOverlaps(values, ranges)

df2
#                  Time #ReportsRunning
# 1 2018-11-01 13:00:00               0
# 2 2018-11-02 13:00:00               1
# 3 2018-11-02 14:00:00               2
# 4 2018-11-02 15:00:00               1
Scarabee
  • 5,437
  • 5
  • 29
  • 55