group_by to select first two rows, then spread()

Question

I'm trying to reformat this so I can generate a dataframe of all instances of On Hold Begins and the next event immediately after it. On Hold Begins is the start an event, and I'd like to capture its Timestamp and Deviation as well as the Timestamp and Deviation for the next event immediately after it (i.e. Below Thresold, Stage Enabled).

If possible, I only want to include slices that have On Hold Begins as the first event (so the ideal solution would not include rows 1 &2 above), do not want the additional X columns, and would want it to be formatted as I described. This is similar to: How can I spread repeated measures of multiple variables into wide format?, but I ran into errors asking for a dictionary when I tried it.

Thank you all so much for the help.

Does it have to be a dplyr solution? Seems like you can do this easily with normal R indexing. — thc, Feb 13 '18 at 20:16
No, but I do have a chain already set up. Willing to budge if there's a way :) — longlivebrew, Feb 13 '18 at 20:17

score 2 · Accepted Answer · answered Feb 13 '18 at 20:21

Simple solution using base R:

first_idx <- which(df$Flag == "On Hold Begins")
second_idx <- first_idx + 1
df_1 <- df[first_idx,]; colnames(df_1) <- paste("Flag 1 ", colnames(df_1))
df_2 <- df[second_idx,]; colnames(df_2) <- paste("Flag 2 ", colnames(df_2))
cbind(df_1, df_2)

   Flag 1  Stage   Flag 1  Flag Flag 1  Timestamp Flag 1  x Flag 1  Deviation Flag 2  Stage    Flag 2  Flag Flag 2  Timestamp Flag 2  x Flag 2  Deviation
3              a On Hold Begins     4/29/17 15:34         1             1.200             a Below Threshold     4/29/17 15:35         1            0.0000
6              a On Hold Begins     4/29/17 21:49         5             1.200             a Below Threshold     4/29/17 21:50         5            0.0000
10             a On Hold Begins     4/29/17 23:29         6             1.200             a Below Threshold     4/29/17 23:30         6            0.0000
12             a On Hold Begins     5/16/17 17:22         8             1.774             a   Stage Enabled     5/16/17 17:39         8            1.8973
15             a On Hold Begins     5/16/17 19:14         9             1.095             a Below Threshold     5/16/17 19:15         9           -0.2252
21             b On Hold Begins     4/28/17 22:05       125             1.200             b    On Hold Ends     4/28/17 22:07       125            1.2000
24             b On Hold Begins     4/28/17 23:29       128             1.200             b Below Threshold     4/28/17 23:30       128            0.0000
26             b On Hold Begins      4/29/17 1:53       133             1.200             b Below Threshold      4/29/17 1:55       133            0.0000
29             b On Hold Begins      4/29/17 2:40       135             1.200          <NA>            <NA>              <NA>        NA                NA

golly gee, this is beautiful. Thank you so much @thc exactly what I was looking for :) — longlivebrew, Feb 13 '18 at 20:28

Nettle · Answer 2 · 2018-02-14T05:42:54.540

My solution 1) assigns common serial to related records; 2) groups and slices the first in the set, and tags with "Flag 1" or "Flag 2."

df_tidy <- df %>% 
  slice(-1) %>%             
  mutate(my_serial = case_when(
    str_detect(Flag, "On Hold Begins")~row_number() )) %>% 
    fill(my_serial) %>%     #< Assign serials to related records
  group_by(my_serial) %>% 
  slice(1:2) %>%            #< Take first records in each set
  mutate(flag_number = if_else(
    str_detect(Flag, "On Hold Begins"), "Flag 1", "Flag 2")) #< Tag Records

df_1 <- df_tidy %>% 
  filter(flag_number %in% "Flag 1") %>% 
  select(1:3) %>% 
  setNames(paste0("Flag 1_", names(.)) ) 

df_2 <- df_tidy %>% 
  filter(flag_number %in% "Flag 2") %>% 
  select(1:3) %>% 
  setNames(paste0("Flag 2_", names(.)) ) 

bind_cols(df_1, df_2)

group_by to select first two rows, then spread()

2 Answers2

Linked