Traminer R for sequence analysis: how to account for state order besides spell lenght?

Question

I'm doing sequence analysis with Traminer on R and I would like to take into account only the order of different spells over time. For instance, I would like that the sequence A-B-A would be considered the same as A-B-B-B-A when plotting the most frequent sequences or when using the Index plot. Is there an option to deal with this type of analysis without changing the data format?

Matthias Studer · Accepted Answer · 2023-02-21T08:49:16.940

There are two strategies to produce plots focusing on the ordering of the state.

Remove any timing information.
Use plots focuses on state sequencing: parallel coordinate plots.

You can also produce a typology focusing on state ordering using specific distance measures.

Example

Let's take an example. First build the sequence object:

library(TraMineR)
#> 
#> TraMineR development version 2.3-4 (Built: 2022-11-29)
#> Website: http://traminer.unige.ch
#> Please type 'citation("TraMineR")' for citation information.
data(biofam)
## Create the sequence object
bfstates <- c("Parent", "Left", "Married", "Left/Married",  "Child", "Left/Child", "Left/Married/Child", "Divorced")
bf.shortlab <- c("P","L","M","LM","C","LC", "LMC", "D")
bf.seq <- seqdef(biofam[,10:25], states=bf.shortlab, labels=bfstates)
#>  [>] state coding:
#>        [alphabet]  [label]  [long label]
#>      1  0           P        Parent
#>      2  1           L        Left
#>      3  2           M        Married
#>      4  3           LM       Left/Married
#>      5  4           C        Child
#>      6  5           LC       Left/Child
#>      7  6           LMC      Left/Married/Child
#>      8  7           D        Divorced
#>  [>] 2000 sequences in the data set
#>  [>] min/max sequence length: 16/16

^{Created on 2023-02-21 with reprex v2.0.2}

Remove any timing information

You can remove timing information using the seqdss function:

bf.dss <- seqdss(bf.seq)

And then plot it (any plots for sequences will work):

seqfplot(bf.dss)

seqIplot(bf.dss, sortv="from.start")

Parallel Coordinate plots

Parallel coordinates plot aims to focus on the order of states only:

seqpcplot(bf.dss)

The results might look messy (depending on your data). You can highlight the most common ordering of state by showing in color pattern that account in total for 50% of cases

seqpcplot(bf.seq , filter = list(type = "function",
                                 value = "cumfreq",
                                 level = 0.5))

See the following reference for more.

Bürgin, R. and G. Ritschard (2014), A decorated parallel coordinate plot for categorical longitudinal data, The American Statistician 68(2), 98-103. https://doi.org/10.1080/00031305.2014.887591

Typology

If you would like to build a typology focusing on state sequencing, you need to choose the distance measure accordingly. See the guideline section of the following article for more details.

Studer, M. and Ritschard, G. (2016), What matters in differences between life trajectories: a comparative review of sequence dissimilarity measures. J. R. Stat. Soc. A, 179: 481-511. https://doi.org/10.1111/rssa.12125

score 1 · Answer 2 · answered Feb 21 '23 at 08:19

Building on Matthias's solution, you can also plot the full sequences bf.seq using the sorting of the DSS sequences bf.dss. Here we use the sortvfunction of TraMineRextras.

library(TraMineR)
data(biofam)
## Create a cohort factor for later use
biofam$cohort <- cut(biofam$birthyr, c(1900,1930,1940,1950,1960), 
                     labels=c("1900-1929", "1930-1939", "1940-1949", "1950-1959"), right=FALSE)
## Create the sequence object
bfstates <- c("Parent", "Left", "Married", "Left/Married",  "Child", "Left/Child", "Left/Married/Child", "Divorced")
bf.shortlab <- c("P","L","M","LM","C","LC", "LMC", "D")
bf.seq <- seqdef(biofam[,10:25], states=bf.shortlab, labels=bfstates)
bf.dss <- seqdss(bf.seq)

library(TraMineRextras)
seqIplot(bf.seq, sortv=sortv(bf.dss), legend.prop=.2)

score 0 · Answer 3 · answered Feb 21 '23 at 07:40

I don't see how you can achieve your goal without touching the sequence format. If you want to focus on sequencing, ignoring the spell durations you need the distinct state sequence format. Luckily, TraMineR provides the seqdss() function to obtain the DSS sequences very easily. Here is an example with the two sequences mentioned in the question above:

library(TraMineR)
#> 
#> TraMineR stable version 2.2-6 (Built: 2023-01-02)
#> Website: http://traminer.unige.ch
#> Please type 'citation("TraMineR")' for citation information.

## Generate example data with 2 sequences
seq1 <- c("A", "B", "A")
seq2 <- c("A", "B", "B", "B", "A")
length(seq1) <- length(seq2)
seqdata <- rbind(seq1,seq2) |> seqdef()

# Tabulate the sequences considering durations (default)
seqtab(seqdata)
#>             Freq Percent
#> A/1-B/1-A/1    1      50
#> A/1-B/3-A/1    1      50
# Tabulate DSS sequences (getting rid of duration information)
seqtab(seqdss(seqdata))
#>             Freq Percent
#> A/1-B/1-A/1    2     100