I have a large dataset in which one columns contains lots of sequential runs of different lengths (e.g. 1,2,1,2,3,1,2,1,2,3,4,5). A lot of these runs are only 1,2 - then start from 1 again.
I would like to somehow group/split my data into all the rows contained within one of those runs. Let's say for simplicity's sake that my data looks something like this (I have updated column a to contain the first 15 values from my actual data to illustrate the problem):
df <- data.frame(a = c(1:2, 1:5, 1:2, 1:4, 1:2), b = c(LETTERS[1:15]), c = c(1:15))
a b c
1 1 A 1
2 2 B 2
3 1 C 3
4 2 D 4
5 3 E 5
6 4 F 6
7 5 G 7
8 1 H 8
9 2 I 9
10 1 J 10
11 2 K 11
12 3 L 12
13 4 M 13
14 1 N 14
15 2 O 15
I want to get either a group (or a new columns with a conditionally assigned value that I can use to group/split) containing the first run in df[,1] (consisting of rows 1, 2 and 3), one containing the second run (rows 4, 5, 6), and so on.
I have tried to do what was suggested in a answer to the unedited version of this question test <- group_by(df, a) %>% mutate(run = seq(length(a)))
But the result is somewhat garbled, which I suspect originates from the fact that a lot of the runs are just going up to 2.
Source: local data frame [15 x 4]
Groups: a [5]
a b c run
<int> <fctr> <int> <int>
1 1 A 1 1
2 2 B 2 1
3 1 C 3 2
4 2 D 4 2
5 3 E 5 1
6 4 F 6 1
7 5 G 7 1
8 1 H 8 3
9 2 I 9 3
10 1 J 10 4
11 2 K 11 4
12 3 L 12 2
13 4 M 13 2
14 1 N 14 5
15 2 O 15 5
Edit: The solution
split(df, cumsum(c(TRUE, diff(df$a)!=1)))
works great.