Given this data frame:
library(dplyr)
dat <- data.frame(
bar = c(letters[1:10]),
foo = c(1,2,3,5,8,9,11,13,14,15)
)
bar foo
1 a 1
2 b 2
3 c 3
4 d 5
5 e 8
6 f 9
7 g 11
8 h 13
9 i 14
10 j 15
I first want to identify groups, if the foo
number is consecutive:
dat <- dat %>% mutate(in_cluster =
ifelse( lead(foo) == foo +1 | lag(foo) == foo -1,
TRUE,
FALSE))
Which leads to the following data frame:
bar foo in_cluster
1 a 1 TRUE
2 b 2 TRUE
3 c 3 TRUE
4 d 5 FALSE
5 e 8 TRUE
6 f 9 TRUE
7 g 11 FALSE
8 h 13 TRUE
9 i 14 TRUE
10 j 15 TRUE
As can be seen, the values 1,2,3
form a group, then value 5
is on it's own and does not belong to a cluster, then values 8,9
form another cluster and so on.
I would like to add cluster numbers to these "groups".
Expected output:
bar foo in_cluster cluster_number
1 a 1 TRUE 1
2 b 2 TRUE 1
3 c 3 TRUE 1
4 d 5 FALSE NA
5 e 8 TRUE 2
6 f 9 TRUE 2
7 g 11 FALSE NA
8 h 13 TRUE 3
9 i 14 TRUE 3
10 j 15 TRUE 3