transposing data and sequence mining most common patterns in rows

Question

I have a data frame that looks like this:

              SFOpID Number MAGroupID
1 0032A00002cgs3XQAQ      1        99
2 0032A00002cgs3XQAQ      1        79
3 003F000001vyUGKIA2      2         8
4 0032A00002btWE6QAM      3        97
5 0032A00002btWE6QAM      3        86
6 0032A00002btWE6QAM      3        35

I need to transpose it so that it looks like this:

              SFOpID Number MAGroupID
1 0032A00002cgs3XQAQ      1        99  79
3 003F000001vyUGKIA2      2         8

Then generate counts for the five most common sequences for example: 12 people (SFOpID) have the 97 86 35 sequence, but only 4 people have the 99 79 sequence. I think this may be possible with the arules package doing something like the following:

x <- read_baskets(con  = system.file("misc", "zaki.txt", package = 
                                 "arulesSequences"),
      info = c("sequenceID","eventID","SIZE"))
      as(x, "data.frame")

The goal is to have output that looks like this:

       items sequenceID eventID SIZE
 1      {C,D}          1      10    2
 2    {A,B,C}          1      15    3
 3    {A,B,F}          1      20    3
 4  {A,C,D,F}          1      25    4
 5    {A,B,F}          2      15    3

Just, for items, it would be a sequence like {99, 79} or {97, 86, 35}

Please edit your question and include a [reproducible example](https://stackoverflow.com/questions/5963269/how-to-make-a-great-r-reproducible-example) and expected output (also not as a picture). — markus, Aug 15 '18 at 20:14

score 0 · Answer 1 · answered Aug 15 '18 at 22:49

0

You can use group_by and next to collect values into one list. The list could be converted to text. Here is an example:

 code <- read.csv("code.csv", stringsAsFactors = F)
  library(dplyr)
  output <- code[, 2:4]%>%
    group_by(Number, MAGroupID) %>%
    nest()
  output$data <- as.character(output$data )

answered Aug 15 '18 at 22:49

Nar

648
4
8

Thank you for this; I will try it. – ATF Aug 16 '18 at 22:16

transposing data and sequence mining most common patterns in rows

1 Answers1