1

I have a dataset. You can see example of my dataset enter image description here

I have 12000 row and 50 student in my dataset. I want to determine session number for every row. If there is more than half an hour between two times in a student's lines, I will take these lines as a different session. This is done in the code block below. but do not reset the number of sessions for each student. The number of sessions should continue where it left off. How can I do it.

My code:

data1 <- data %>%
  arrange(Name, Time) %>%
  group_by(Name) %>%
  mutate(session_diff = cumsum(c(0, diff(Time) > minutes(30)))) 

for example, enter image description here

This is done in the code block below. but do not reset the number of sessions for each student. The number of sessions should continue where it left off.

Sanem
  • 11
  • 2
  • 5
    Hey @Sanem! Please dont post your data as an image [for these reasons](https://meta.stackoverflow.com/a/285557/12109788). If you need help converting your dataset to reproducible code, you can find it [here](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example). Good luck! – jpsmith Jun 15 '23 at 23:32

1 Answers1

0

With sample data (created by me- let me know if it's not suitable:

library(data.table)

# Create the sample data dataframe
data <- data.frame(
  Time = as.POSIXct(character()),
  Name = character()
)

# Add sample data
data <- rbind(data, 
              list(Time = as.POSIXct("2023-06-16 09:30:00"), Name = "student1"),
              list(Time = as.POSIXct("2023-06-16 10:15:00"), Name = "student2"),
              list(Time = as.POSIXct("2023-06-16 11:00:00"), Name = "student3"),
              list(Time = as.POSIXct("2023-06-16 12:30:00"), Name = "student1"),
              list(Time = as.POSIXct("2023-06-16 13:15:00"), Name = "student2"),
              list(Time = as.POSIXct("2023-06-16 14:00:00"), Name = "student3")
)

# Print the data
print(data)

# create number of sessions data.table
sessions <- data.table(Name = character(), SessionCount = numeric(), lastSession = as.POSIXct(character()))

# add all of the users to the sessions data.table, with a session count of 0, and a last session of 1970-01-01 00:00:00
for (i in 1:length(unique(data$Name))) {
  sessions <- rbind(sessions, list(Name = unique(data$Name)[i], SessionCount = 0, lastSession = as.POSIXct("1970-01-01 00:00:00")))
}

# loop through data, and for each row, if the session is more than 30 minutes after the most recent session, increment the session count
for (i in 1:nrow(data)) {
  student = data$Name[i]
  last_session = sessions[Name == student]$lastSession
  if (data$Time[i] > last_session + 30*60) {
    sessions[Name == student]$SessionCount = sessions[Name == student]$SessionCount + 1
    sessions[Name == student]$lastSession = data$Time[i]
  }
}
Mark
  • 7,785
  • 2
  • 14
  • 34