Comparing data in R

Question

I'm still new to R and feel there has to be a better way to do what I've done. I am trying to compare a process and determine if it fits specific sequence.... Also, later I'm planning on expanding this to say, if sequence A, then "cool", else if sequence b then "kinda cool", else, "not cool at all".

For the sample data, let's determine if bakers are following the correct steps for baking a recipe.

merged_data <-(sampledata,proper_sequence, "sequence description")

 1. Baker    Actual_Sequence_#   Sequence     proper sequence
 3. John        1         Bought ingredients    1
 4. John        2         Read recipe           1
 5. Jack        1         Read recipe           1
 6. Jack        2         Bought ingredients    1
 7. Jack        3         Mixed ingredients     3
 8. Jack        4         Preheated oven        2
 9. Jane        1         Preheated oven        2
 10. Jane       2         Bought ingredients    1
 11. Jill       1         Mixed ingredients     2



#spread the data by actual sequence and fill with proper sequence; I feel this step could be cut out, but not sure how.

spread_data<- spread(sampledata,key = "Actual_Sequence_#",value = "proper sequence")

1. Baker     1   2   3   4
2. John      1   1      
3. Jack      1   1   3   2
4. Jane      2   1      
5. Jill      2

concatenate and eliminate duplicates

I actually need help with this bit of code. desired outcome is a two column data frame

condensed_data<- spread_data(group_by(Baker),????)

1. Baker Sequence  concactenated 
2. John      1      
3. Jack      1,3,2
4. Jane      2,1      
5. Jill      2

add a new column that evaluates concatenated actual sequence with proper sequence

evaluation <- mutate(eval_of_sequence=
ifelse(grepl("1,2,3,4",condensed_data$`concatenated`),"following proper sequence",
ifelse(grepl("1,2,3",condensed_data$`concatenated`),"following proper sequence",
ifelse(grepl("1,2",condensed_data$`concatenated`),"following proper sequence",
ifelse(grepl("1",condensed_data$`concatenated`),"following proper sequence", 
"breaking proper sequence"))

1. Baker  Sequence_concatenated  evaluation
2. John      1           following proper sequence
3. Jack      1,3,2       breaking proper sequence 
4. Jane      2,1         breaking proper sequence 
5. Jill      2           following proper sequence

I don't understand what all these slashes are. See [how to create a reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) for a proper way to include sample data and the desired output. — MrFlick, Oct 05 '17 at 16:06
It is better if you give R code where you fill the initial data and ask only one question. — keiv.fly, Oct 05 '17 at 16:06
Sorry, whey I typed it out, it wasn't separating the text so I used slashes... I rewrote it — Lyndon L., Oct 05 '17 at 21:27

score 0 · Accepted Answer · answered Oct 05 '17 at 16:26

library(dplyr)

#Create data.frame that looks roughly like yours
merged_data <- data.frame(Baker = c("John", "John", "Jack", "Jack", "Jack", "Jack", "Jane", "Jane", "Jill"), 
                 Actual_Sequence = c(1,2,1,2,3,4,1,2,1), 
                 proper_sequence = c(1,1,1,1,3,2,2,1,2)) 

#Use dplyr to group by baker, concatenate their process, then evaluate
#by comparing to the proper sequence field. If equal assume correct.
merged_data %>% 
  group_by(Baker) %>% 
  summarise(Actual_Sequence = paste(Actual_Sequence, collapse = ","),
            proper_sequence = paste(proper_sequence, collapse = ",")) %>%
  mutate(evaluation = ifelse(Actual_Sequence == proper_sequence, "following proper sequence", "breaking proper sequence"))

If I understand your post properly, and I'm not sure I do, this will give you the result you want. You can fiddle with the dplyr statement to work out how it works.

Thanks for this! I had to make an adjustment because I forgot about the sequence code 99, for when someone decides to quit. But the example you gave me allowed me to generalize enough of this process over to that. Thanks again. — Lyndon L., Oct 06 '17 at 17:52

Comparing data in R

concatenate and eliminate duplicates

add a new column that evaluates concatenated actual sequence with proper sequence

1 Answers1