Creating new column based on repeated consecutive row entries

Question

Imagine a snippet of the follow data frame:

       ID        ActivityName     Time         Type    Shape 
1       1             Request    0.000       Type_1      767           
2       1             Request  600.000       Type_1      767           
3       1               Start  600.000       Type_1     1376           
4       1               Start  600.000       Type_1     1376  
5       1               Delay 2962.295       Type_1     1022         
6       1 Schedule Activities  600.000       Type_1       15           
7       1 Schedule Activities 2062.295       Type_1       15

What I'm trying to do is to create two new columns based on the repeating entries in ActivityName.

Specifically, I want to combine two subsequent rows for the same activity into one row with a start and complete timestamp (from Time, in seconds).

Given that not all entries in ActivityName have a matching second entry (however, max two consecutive entries are identical), I would also like to delete such rows, as Delay in this case.

It should then look something like:

       ID        ActivityName  StartTime   EndTime      Type    Shape 
1       1             Request  0.000       600.000      Type_1  767           
2       1               Start  600.000     600.000      Type_1  1375
3       1 Schedule Activities  600.000     2062.295     Type_1  15

All categories in ActivityName occur many times in that column. I hope not to compare their associated Time not across the whole column, only those who have two consecutive identical occurrence.

Any ideas for how to go about this will be highly appreciated.

Having reproducible data will help you get answers. You can use the dput function to help. — Daniel O, Feb 20 '20 at 14:09
@Alex - I just saw your previous question which looked identical to this one. It might help to revise your question with a more complete example, where the other solution failed, and provide the desired result. For instance, can you add examples of the *single standing* rows? And other situations where the previous solution doesn't work for you? — Ben, Feb 20 '20 at 15:01

score 0 · Answer 1 · answered Feb 20 '20 at 14:35

Something like this?

df<-data.frame(activity_name = c("A", "A", "B", "B", "C", "C"),
               time = c(0,2,2,4,4,6))
df

  activity_name time
1             A    0
2             A    2
3             B    2
4             B    4
5             C    4
6             C    6

library(tidyverse)
df %>% 
  group_by(activity_name) %>% 
  summarise(StartTime = first(time),
            EndTime = last(time))

  activity_name StartTime EndTime
  <fct>             <dbl>   <dbl>
1 A                     0       2
2 B                     2       4
3 C                     4       6

Hi, thanks! This is almost it - the problem is that when having more instances of "A"s, "B"s, "C"s etc in activity_name., this always checks the whole column to obtain their minimum and maximum associated time values. Instead, I'm trying to compare them locally, i.e. two consecutively occurring A's have a StartTime and EndTime independent of another pair of consecutive A's. — , Feb 20 '20 at 15:53

score 0 · Answer 2 · answered Feb 21 '20 at 14:26

Comparing activity_name groups locally. Making a new field to group by, that identifies when the local group changes.

df<-data.frame(activity_name = c("A", "A", "B", "B", "C", "C", "A"),
               time = c(0,2,2,4,4,6,99))

  activity_name time
1             A    0
2             A    2
3             B    2
4             B    4
5             C    4
6             C    6
7             A   99

library(tidyverse)

df %>% 
  mutate(groupChanged = (activity_name != lag(activity_name, default = activity_name[1])),
         toCutBy = cumsum(groupChanged)) %>% 
  group_by(toCutBy) %>%
  summarise(activity_name = first(activity_name),
            StartTime = first(time),
            EndTime = last(time))

  toCutBy activity_name StartTime EndTime
1       0 A                     0       2
2       1 B                     2       4
3       2 C                     4       6
4       3 A                    99      99

Example: https://stackoverflow.com/a/43127176/10276092

Creating new column based on repeated consecutive row entries

2 Answers2