I'm trying to use dplyr
to take the first and last rows of repeated values by group. I'm doing this for efficiency reasons, particularly so that graphing is faster.
This is not a duplicate of Select first and last row from grouped data because I'm not asking for the strict first and last row in a group; I'm asking for the first and last row in a group by level (in my case 1's and 0's) that may appear in multiple chunks.
Here's an example. Say I want to remove all the redundant 1's and 0's from column C while keeping A and B intact.
df = data.frame(
A = rep(c("a", "b"), each = 10),
B = rep(c(1:10), 2),
C = c(1,0,0,0,0,0,1,1,1,1,0,0,0,1,0,0,0,0,0,1))
A B C
a 1 1
a 2 0
a 3 0
a 4 0
a 5 0
a 6 0
a 7 1
a 8 1
a 9 1
a 10 1
b 1 0
b 2 0
b 3 0
b 4 1
b 5 0
b 6 0
b 7 0
b 8 0
b 9 0
b 10 1
The end result should look like this:
A B C
a 1 1
a 2 0
a 6 0
a 7 1
a 10 1
b 1 0
b 3 0
b 4 1
b 5 0
b 9 0
b 10 1
Using unique
will either not remove anything or just take one of the 1's or 0's without retaining the start-and-end quality that I'm trying to achieve. Is there a way to do this without a loop, perhaps using dplyr
or forcats
?