3

I would like to add 1 to session column when the difference between the first timestamp of the same session and followings timestamp records is more than 10 units.

in other words:
if the gap in timestamp column is more than 10 in the same session, add 1 to the rest of the sessions for a specific ID. So we shouldn't have the same session with the gap more than 10 in its records.
lets say:

df<-read.table(text="
ID      timestamp    session
1       10             1
1       12             1
1       15             1
1       21             1
1       25             1
1       27             2
1       29             2
2       11             1
2       22             2
2       27             2
2       32             2
2       42             2
2       43             3",header=T,stringsAsFactors = F)

In the example above, for ID==1 the session gap from the first record (timestamp==10) is more than 10 in row 4 (timestamp==21), so we add 1 to the rest of sessions. Whenever the session number change the difference of the first record of timestamp should be less than 10 in the same sassion, otherwise it should add to session.

result:  

ID      timestamp    session
1      *10             1
1       12             1
1       15             1
1      *21             2     <-- because 21-10 >= 10 it add 1 to the rest of sessions in this ID 
1       25             2
1       27             3
1       29             3
2       11             1
2      *22             2
2       27             2
2      *32             3     <-- because 32-22>= 10 it add 1 to the rest of session
2      *42             4     <-- because 42-32>=10
2       43             5

How can I do it in R?

Cina
  • 9,759
  • 4
  • 20
  • 36
  • Do you have to add 1 to all values whose difference with first stamp is greater than 10? – Duck Sep 25 '20 at 18:17
  • Also in second group first timestamp is 11, why do you use 22? – Duck Sep 25 '20 at 18:25
  • @Duck Yes, the difference with the first value which they have the same session number. regarding your second question, it is still 11. 22 is the second timestamp. – Cina Sep 25 '20 at 18:41

1 Answers1

1

Perhaps a custom function might help that calculates a cumulative sum and resets once threshold reached. In this case, if you provide the function with the session data, it will provide a result that will include a cumulative "offset" for session, but only in rows when the session number did not increase. This addresses the case of ID 2 timestamp 22 where the difference > 10, but session number increased from 1 to 2.

library(tidyverse)

threshold <- 10

cumsum_with_reset <- function(x, session, threshold) {
  cumsum <- 0
  group <- 0
  result <- numeric()
  for (i in seq_along(x)) {
    cumsum <- cumsum + x[i]
    if (cumsum >= threshold) {
      if (session[i] == session[i-1]) {
        group <- group + 1
      }
      cumsum <- 0
    }
    result = c(result, group)
  }
  return (result)
}

df %>%
  group_by(ID) %>%
  mutate(diff = c(0, diff(timestamp)),
         cumdiff = cumsum_with_reset(diff, session, threshold),
         new_session = cumdiff + session)

Function adapted from this solution.

Output

      ID timestamp session  diff cumdiff new_session
   <int>     <int>   <int> <dbl>   <dbl>       <dbl>
 1     1        10       1     0       0           1
 2     1        12       1     2       0           1
 3     1        15       1     3       0           1
 4     1        21       1     6       1           2
 5     1        25       1     4       1           2
 6     1        27       2     2       1           3
 7     1        29       2     2       1           3
 8     2        11       1     0       0           1
 9     2        22       2    11       0           2
10     2        27       2     5       0           2
11     2        32       2     5       1           3
12     2        42       2    10       2           4
13     2        43       3     1       2           5
Ben
  • 28,684
  • 5
  • 23
  • 45