I have a data frame that looks like this:
SFOpID Number MAGroupID
1 0032A00002cgs3XQAQ 1 99
2 0032A00002cgs3XQAQ 1 79
3 003F000001vyUGKIA2 2 8
4 0032A00002btWE6QAM 3 97
5 0032A00002btWE6QAM 3 86
6 0032A00002btWE6QAM 3 35
I need to transpose it so that it looks like this:
SFOpID Number MAGroupID
1 0032A00002cgs3XQAQ 1 99 79
3 003F000001vyUGKIA2 2 8
Then generate counts for the five most common sequences for example: 12 people (SFOpID) have the 97 86 35 sequence, but only 4 people have the 99 79 sequence. I think this may be possible with the arules package doing something like the following:
x <- read_baskets(con = system.file("misc", "zaki.txt", package =
"arulesSequences"),
info = c("sequenceID","eventID","SIZE"))
as(x, "data.frame")
The goal is to have output that looks like this:
items sequenceID eventID SIZE
1 {C,D} 1 10 2
2 {A,B,C} 1 15 3
3 {A,B,F} 1 20 3
4 {A,C,D,F} 1 25 4
5 {A,B,F} 2 15 3
Just, for items, it would be a sequence like {99, 79} or {97, 86, 35}