2

I have a data frame with attendance data from a Zoom event containing email addresses, join time, and leave time. Many attendees log in, log out, and then log back in, and are therefore represented in multiple rows. I want to calculate how many total minutes attendees were logged in. In inspecting the data, I noticed one person where they have overlapping time intervals (see email3 in the example below), and I want to be able to identify any others in the dataset where this is the case.

Here is an example data frame already with the desired new column "overlap":

structure(list(Email= c("email1@gmail.com", "email2@gmail.com", "email2@gmail.com", "email3@gmail.com",
"email3@gmail.com", "email3@gmail.com"), Join.Time = structure(c(as.POSIXct("2020-12-09 13:04:00"), 
as.POSIXct("2020-12-09 13:20:00"), as.POSIXct("2020-12-09 13:30:00"),as.POSIXct("2020-12-09 13:07:00"), 
as.POSIXct("2020-12-09 13:46:00"),as.POSIXct("2020-12-09 13:29:00")), class = c("POSIXct", "POSIXt"), 
tzone = ""), Leave.Time = structure(c(as.POSIXct("2020-12-09 13:25:00"), as.POSIXct("2020-12-09 13:22:00"),
as.POSIXct("2020-12-09 14:01:00"), as.POSIXct("2020-12-09 13:29:00"), as.POSIXct("2020-12-09 14:00:00"),
as.POSIXct("2020-12-09 14:33:00")), class = c("POSIXct", "POSIXt"), tzone = "America/New_York"), 
    Overlap = c(FALSE, FALSE, FALSE, TRUE, TRUE, TRUE)), .Names = c("Email", "Join.Time", "Leave.Time", "Overlap"
), row.names = c(NA, -6L), class = "data.frame")

             Email           Join.Time          Leave.Time Overlap
1 email1@gmail.com 2020-12-09 13:04:00 2020-12-09 13:25:00   FALSE
2 email2@gmail.com 2020-12-09 13:20:00 2020-12-09 13:22:00   FALSE
3 email2@gmail.com 2020-12-09 13:30:00 2020-12-09 14:01:00   FALSE
4 email3@gmail.com 2020-12-09 13:07:00 2020-12-09 13:29:00    TRUE
5 email3@gmail.com 2020-12-09 13:46:00 2020-12-09 14:00:00    TRUE
6 email3@gmail.com 2020-12-09 13:29:00 2020-12-09 14:33:00    TRUE

I tried to solution suggested here: R Find overlap among time periods but when I do I get the error "Error in if (int_overlaps(intervals[i], intervals[j])) { : missing value where TRUE/FALSE needed"

Would appreciate any help!!

user90401
  • 25
  • 2

2 Answers2

1

Another option from the thread you mentioned separately counts the overlapping values and adds them in as a separate column in a separate dataframe. Try this. It worked for me. I get the same output you provided.

library(data.frame)
dt <- data.table(df, key=c("Join.Time", "Leave.Time"))[, `:=`(Overlap=NULL, row=1:nrow(df))]
overlapping <- unique(foverlaps(dt, dt)[Email==i.Email & row!=i.row, Email])
dt[, `:=`(Overlap=FALSE, row=NULL)][Email %in% overlapping, Overlap:=TRUE][order(Email, Join.Time)]
AcidCatfish
  • 192
  • 8
  • It's essentially saying, count the overlapping data in dt. Then, in dt create a column called Overlap that is false unless the Email in overlapping is counted then Overlap will be TRUE. – AcidCatfish Jan 09 '21 at 01:36
  • 1
    I think that worked, thanks! For some reason I need to run each line separately or else I get error messages. – user90401 Jan 09 '21 at 02:03
  • I ran it again, I got a warning message indicating that there is the possibilities of mixed timezones and that not every time point will be overlapping, which we already know and want. – AcidCatfish Jan 09 '21 at 23:14
0

Although an older question, here's a newer option using the IV package dedicated to working with intervals:

library(tidyverse)
library(ivs)

# Provided input data
df <- structure(list(
  Email = c(
    "email1@gmail.com", "email2@gmail.com", "email2@gmail.com", "email3@gmail.com",
    "email3@gmail.com", "email3@gmail.com"
  ), Join.Time = structure(c(
    as.POSIXct("2020-12-09 13:04:00"),
    as.POSIXct("2020-12-09 13:20:00"), as.POSIXct("2020-12-09 13:30:00"), as.POSIXct("2020-12-09 13:07:00"),
    as.POSIXct("2020-12-09 13:46:00"), as.POSIXct("2020-12-09 13:29:00")
  ),
  class = c("POSIXct", "POSIXt"),
  tzone = ""
  ), Leave.Time = structure(c(
    as.POSIXct("2020-12-09 13:25:00"), as.POSIXct("2020-12-09 13:22:00"),
    as.POSIXct("2020-12-09 14:01:00"), as.POSIXct("2020-12-09 13:29:00"), as.POSIXct("2020-12-09 14:00:00"),
    as.POSIXct("2020-12-09 14:33:00")
  ), class = c("POSIXct", "POSIXt"), tzone = "America/New_York"),
  Overlap = c(FALSE, FALSE, FALSE, TRUE, TRUE, TRUE)
), .Names = c("Email", "Join.Time", "Leave.Time", "Overlap"), row.names = c(NA, -6L), class = "data.frame")

# Check for overlaps
df |> 
  mutate(iv = iv(Join.Time, Leave.Time)) |> 
  group_by(Email) |> 
  mutate(iv = iv_groups(iv)) |> 
  ungroup() |> 
  add_count(iv) |> 
  mutate(overlap = if_else(n > 1, TRUE, FALSE))
#> # A tibble: 6 × 7
#>   Email            Join.Time           Leave.Time          Overlap
#>   <chr>            <dttm>              <dttm>              <lgl>  
#> 1 email1@gmail.com 2020-12-09 13:04:00 2020-12-09 08:25:00 FALSE  
#> 2 email2@gmail.com 2020-12-09 13:20:00 2020-12-09 08:22:00 FALSE  
#> 3 email2@gmail.com 2020-12-09 13:30:00 2020-12-09 09:01:00 FALSE  
#> 4 email3@gmail.com 2020-12-09 13:07:00 2020-12-09 08:29:00 TRUE   
#> 5 email3@gmail.com 2020-12-09 13:46:00 2020-12-09 09:00:00 TRUE   
#> 6 email3@gmail.com 2020-12-09 13:29:00 2020-12-09 09:33:00 TRUE   
#> # … with 3 more variables: iv <iv<dttm>>, n <int>, overlap <lgl>

Created on 2022-05-27 by the reprex package (v2.0.1)

Carl
  • 4,232
  • 2
  • 12
  • 24