2

Lets say I have this data. My objective is to extraxt combinations of sequences.
I have one constraint, the time between two events may not be more than 5, lets call this maxGap.

User <- c(rep(1,3))     # One users
Event <- c("C","B","C") # Say this is random events could be anything from LETTERS[1:4]
Time <- c(c(1,12,13))   # This is a timeline
df <- data.frame(User=User,
             Event=Event,
             Time=Time)

If want to use these sequences as binary explanatory variables for analysis.
Given this dataframe the result should be like this.

res.df <- data.frame(User=1,
                     C=1,
                     B=1,
                     CB=0,
                     BC=1,
                     CBC=0)  

(CB) and (CBC) will be 0 since the maxGap > 5.
I was trying to write a function for this using many for-loops, but it becomes very complex if the sequence becomes larger and the different number of evets also becomes larger. And also if the number of different User grows to 100 000.

Is it possible of doing this in TraMineR with the help of seqeconstraint?

Developer
  • 917
  • 2
  • 9
  • 25

1 Answers1

1

Here is how you would do that with TraMineR

df.seqe <- seqecreate(id=df$User, timestamp=df$Time, event=df$Event)

constr <- seqeconstraint(maxGap=5)
subseq <- seqefsub(df.seqe, minSupport=0, constraint=constr)
(presence <- seqeapplysub(subseq, method="presence"))

which gives

                   (B) (B)-(C) (C)
1-(C)-11-(B)-1-(C)   1       1   1

presence is a table with a column for each subsequence that occurs at least once in the data set. So, if you have several individuals (event sequences), the table will have one row per individual and the columns will be the binary variable you are looking for. (See also TraMineR: Can I get the complete sequence if I give an event sub sequence? )

However, be aware that TraMineR works fine only with subsequences of length up to about 4 or 5. We suggest to set maxK=3 or 4 in seqefsub. The number of individuals should not be a problem, nor should the number of different possible events (the alphabet) as long as you restrict the maximal subsequence length you are looking for.

Hope this helps

Community
  • 1
  • 1
Gilbert
  • 3,570
  • 18
  • 28
  • Thank you Gilbert, and thank you for a handy package :) – Developer Feb 20 '17 at 08:45
  • unfortunately it failed to perform with a larger dataset. I tried to set maxK to 4,3,2, and 1. But it still did not work. Do you have any ideas of other packages that might work? I have looked at arulesSequences but do not think it works either... – Developer Mar 03 '17 at 13:00