I'm doing sequence analysis with Traminer on R and I would like to take into account only the order of different spells over time. For instance, I would like that the sequence A-B-A would be considered the same as A-B-B-B-A when plotting the most frequent sequences or when using the Index plot. Is there an option to deal with this type of analysis without changing the data format?
3 Answers
There are two strategies to produce plots focusing on the ordering of the state.
- Remove any timing information.
- Use plots focuses on state sequencing: parallel coordinate plots.
You can also produce a typology focusing on state ordering using specific distance measures.
Example
Let's take an example. First build the sequence object:
library(TraMineR)
#>
#> TraMineR development version 2.3-4 (Built: 2022-11-29)
#> Website: http://traminer.unige.ch
#> Please type 'citation("TraMineR")' for citation information.
data(biofam)
## Create the sequence object
bfstates <- c("Parent", "Left", "Married", "Left/Married", "Child", "Left/Child", "Left/Married/Child", "Divorced")
bf.shortlab <- c("P","L","M","LM","C","LC", "LMC", "D")
bf.seq <- seqdef(biofam[,10:25], states=bf.shortlab, labels=bfstates)
#> [>] state coding:
#> [alphabet] [label] [long label]
#> 1 0 P Parent
#> 2 1 L Left
#> 3 2 M Married
#> 4 3 LM Left/Married
#> 5 4 C Child
#> 6 5 LC Left/Child
#> 7 6 LMC Left/Married/Child
#> 8 7 D Divorced
#> [>] 2000 sequences in the data set
#> [>] min/max sequence length: 16/16
Created on 2023-02-21 with reprex v2.0.2
Remove any timing information
You can remove timing information using the seqdss
function:
bf.dss <- seqdss(bf.seq)
And then plot it (any plots for sequences will work):
seqfplot(bf.dss)
seqIplot(bf.dss, sortv="from.start")
Parallel Coordinate plots
Parallel coordinates plot aims to focus on the order of states only:
seqpcplot(bf.dss)
The results might look messy (depending on your data). You can highlight the most common ordering of state by showing in color pattern that account in total for 50% of cases
seqpcplot(bf.seq , filter = list(type = "function",
value = "cumfreq",
level = 0.5))
See the following reference for more.
Bürgin, R. and G. Ritschard (2014), A decorated parallel coordinate plot for categorical longitudinal data, The American Statistician 68(2), 98-103. https://doi.org/10.1080/00031305.2014.887591
Typology
If you would like to build a typology focusing on state sequencing, you need to choose the distance measure accordingly. See the guideline section of the following article for more details.
Studer, M. and Ritschard, G. (2016), What matters in differences between life trajectories: a comparative review of sequence dissimilarity measures. J. R. Stat. Soc. A, 179: 481-511. https://doi.org/10.1111/rssa.12125

- 1,722
- 1
- 10
- 24
Building on Matthias's solution, you can also plot the full sequences bf.seq
using the sorting of the DSS sequences bf.dss
. Here we use the sortv
function of TraMineRextras
.
library(TraMineR)
data(biofam)
## Create a cohort factor for later use
biofam$cohort <- cut(biofam$birthyr, c(1900,1930,1940,1950,1960),
labels=c("1900-1929", "1930-1939", "1940-1949", "1950-1959"), right=FALSE)
## Create the sequence object
bfstates <- c("Parent", "Left", "Married", "Left/Married", "Child", "Left/Child", "Left/Married/Child", "Divorced")
bf.shortlab <- c("P","L","M","LM","C","LC", "LMC", "D")
bf.seq <- seqdef(biofam[,10:25], states=bf.shortlab, labels=bfstates)
bf.dss <- seqdss(bf.seq)
library(TraMineRextras)
seqIplot(bf.seq, sortv=sortv(bf.dss), legend.prop=.2)

- 3,570
- 18
- 28
I don't see how you can achieve your goal without touching the sequence format. If you want to focus on sequencing, ignoring the spell durations you need the distinct state sequence format. Luckily, TraMineR
provides the seqdss()
function to obtain the DSS sequences very easily. Here is an example with the two sequences mentioned in the question above:
library(TraMineR)
#>
#> TraMineR stable version 2.2-6 (Built: 2023-01-02)
#> Website: http://traminer.unige.ch
#> Please type 'citation("TraMineR")' for citation information.
## Generate example data with 2 sequences
seq1 <- c("A", "B", "A")
seq2 <- c("A", "B", "B", "B", "A")
length(seq1) <- length(seq2)
seqdata <- rbind(seq1,seq2) |> seqdef()
# Tabulate the sequences considering durations (default)
seqtab(seqdata)
#> Freq Percent
#> A/1-B/1-A/1 1 50
#> A/1-B/3-A/1 1 50
# Tabulate DSS sequences (getting rid of duration information)
seqtab(seqdss(seqdata))
#> Freq Percent
#> A/1-B/1-A/1 2 100
Created on 2023-02-21 with reprex v2.0.2

- 425
- 3
- 10