9

I have a list of people and their working start and end times during a day. I want to plot a curve showing the total of people working at any given minute in the day. What I could do is just add 1440 additional conditional boolean variables for each minute of the day and sum them up, but that seems very inelegant. I'm wondering if there a better way to do it (integrals?).

Here's the code to generate a df with my sample data:

sample_wt <- function() {

    require(lubridate)

    set.seed(10)

    worktime <- data.frame(
            ID = c(1:100),
            start = now()+abs(rnorm(100,4800,2400))
            )

    worktime$end <- worktime$start + abs(rnorm(100,20000,10000))

    worktime$length <- difftime(worktime$end, worktime$start, units="mins")

    worktime
}

To create a sample data , you can do something like:

DF <- sample_wt() 
Timm S.
  • 5,135
  • 6
  • 24
  • 38

3 Answers3

6

Here one option using IRanges package from Bioconductor.

library(IRanges)
## generate sample
DF <- sample_wt()
## create the range from the sample data
rangesA <- IRanges(as.numeric(DF$start), as.numeric(DF$end))
## create one minute range 
xx = seq(min(DF$start),max(DF$end),60)
rangesB <- IRanges(as.numeric(xx),as.numeric(xx+60))
## count the overlaps
ov <- countOverlaps(rangesB, rangesA, type="within")
## plot the result
plot(xx,ov,type='l')

enter image description here

agstudy
  • 119,832
  • 17
  • 199
  • 261
  • I think there's an error in the third code line, should be: `rangesA <- IRanges(as.numeric(DF$start), as.numeric(DF$end))` (DF instead of rangesA) – Timm S. Sep 12 '14 at 09:21
  • I can't find `sample_wt` function. From which package is it? – Marcin Sep 08 '16 at 14:13
1

Surely it can be improved, but this seems to do it:

time_range <- seq(min(DF$start), max(DF$end), 60)
result <- integer(length(time_range))
for (t in seq_along(time_range)) {
  result[t] <- sum(DF$start <= time_range[t] & DF$end >= time_range[t])
}
Zé Loff
  • 275
  • 1
  • 8
  • Can you please show how to draw plot using this codes? – user10345633 Jun 12 '21 at 13:49
  • 1
    `plot(result)`? – Zé Loff Jun 14 '21 at 10:41
  • I used this codes by using my own data. But this code did not draw a line plot, summing the overlapping times and showing the actual number of ppl who have those overlaps. Could you do it by this code? – user10345633 Jun 15 '21 at 11:04
  • 1
    `plot(result, type = "l")` ? The code above (and please note that nicola's answer is better than mine) samples the elapsed time every 60 seconds, counting the number of time intervals (i.e. lines on the `DF` `data.frame`) which have started before each time point but have not ended yet. The result is a vector of counts, containing the number of active users at each time point. Please check `?plot` for plotting options. – Zé Loff Jun 15 '21 at 14:37
1

I don't have lubridate installed, so I produced the data.frame through Sys.time instead of now (guess they should be similar). This could make the trick:

    minutes<-seq(as.POSIXct(paste(sep="",Sys.Date()," 00:00:00")),by="min",length.out=24*60)
    rowSums(outer(minutes,worktime$start,">") & outer(minutes,worktime$end,"<"))
nicola
  • 24,005
  • 3
  • 35
  • 56
  • Nice one! I like it more than I like mine, and `microbenchmark` says they're equally fast. Just trim the extra `0`s at each end of the vector and it's perfect. – Zé Loff Sep 12 '14 at 16:11