1

I'm trying to answer the question of how many observations have a certain sequence. for example my data looks like this:

CaseNo    Code
1111      RF
1111      AV
1111      RF
2222      AV
2222      TF
3333      AV

and I need an output of:

CaseNo    1    2    3
1111      RF   AV   RF
2222      AV   RF   NULL
3333      AV   NULL NULL

so I can match the sequences I am after.

I've sorted the data so the codes are in the correct order but I need to combine these codes into one line.

I've read that the spread function is a good method but this seems to only work on values but my code column is characters. I've also tried the group_by and nest functions using:

spread_df <- df %>% group_by(CaseNo) %>% nest()

and this brings it back as CaseNo, Code but the codes in Code column are in a list that I can't seem to split.

Is there a way to simply bring back the output I need? Or any other method I am overlooking in discovering sequences?

Thanks

Arun kumar mahesh
  • 2,289
  • 2
  • 14
  • 22
  • 2
    I think you need `df %>% group_by(CaseNo) %>% mutate(rn = row_number()) %>% spread(rn, Code)` – akrun Feb 15 '19 at 12:26
  • Thanks akrun, this worked brilliantly. Next question - nothing to do with code this time - How do I give the mark for answered as this is in the comment section? – Martin Pugsley Feb 15 '19 at 12:46

1 Answers1

1

We can create a sequence after grouping by 'CaseNo' and then use the spread to convert it to 'wide' format

library(tidyverse)
df %>% 
  group_by(CaseNo) %>%
  mutate(rn = row_number()) %>% 
  spread(rn, Code)
akrun
  • 874,273
  • 37
  • 540
  • 662