1

I have a logical problem with the transition cost matrix. I am working on sequences dissimilarity using the R package Traminer.

I try to give you a simple example (very simple, but I hope useful to explain my problem):

There are three sequences and I want to be calculate the dissimilarity matrix. The alphabet is: H (in health), I (ill at home), IH (ill at hospital), D (died)

I observe the 3 subjects for 5 observations. These are the sequences:

H – H – I – D – D 
H – I – I – I – D 
I – I – H – IH – IH 

The substitution cost matrix is a 4x4 table (state x state). It must be symmetric? This is my logical problem: while it is possible to “transit” from states H, I or IH to state Died, the contrary is illogical.

Can I use a non-symmetric substitution cost matrix in TraMineR?

If, in my database, the substitution cost (calculated with sm = "TRATE", for instance) from state “I” to “D” is lower (0.5) than the substitution cost from state 'I' to 'IH' (0.6) , the OM algorithm substitute the “I” whith “D” instead of “HI”.

Gilbert
  • 3,570
  • 18
  • 28
Giampiero
  • 43
  • 3

2 Answers2

2

The transitions rates (estimated transition probabilities) should not be confused with the substitution costs. Substitution costs are supposed to reflect the dissimilarities between states.

The matrix of transition rates (returned by seqtrate) is NOT symmetric.

The substitution costs used to compute distances such as the optimal matching distance, must be symmetric. Otherwise, the result would not be a distance matrix, and inputting such a non symmetric matrix to, for example, a clustering procedure would lead to unexpected results.

Deriving substitution cost from transition rates is just one over several possibilities to define substitution costs. Letting $p(i|j)$ be the probability to transit from $j$ to $i$, it consists in defining the substitution cost as

$c(i,j) = 2 - p(i|j) - p(j|i)$

Gilbert
  • 3,570
  • 18
  • 28
  • Thank you Gilbert! You are very helpful. Since my work is illustrative, I will use a measure that uses these parameters (i.e. method = "TRATE") and another length-basted (LCS) and then compare the results. – Giampiero Feb 19 '15 at 16:49
1

it seems to me that you're looking for a custom cost matrix. It is not mandatory to use either the TRATE or CONSTANT method.

To create a custom matrix you'll just have to do something like this:

myscm <- matrix(c(0,1,2, 
                  1,0,2, 
                  2,2,0), nrow=3, ncol=3) 
dist.om <- seqdist(my.seq, method="OM", sm=myscm)

where myscm is your custom matrix

This was taken from http://lists.r-forge.r-project.org/pipermail/traminer-users/2011-July/000075.html

I believe you have two options:

1) Create a rationale for all the transitions and a full custom matrix

2) Get the transition matrix you've already generated (using seqsubm(your.seq, method = "TRATE") ) and change just the inconsistent values. That's what I've done in my last analysis.

But keep in mind the point made by Gilbert in An "asymmetric" pairwise distance matrix

Community
  • 1
  • 1
Pedro Braz
  • 2,261
  • 3
  • 25
  • 48
  • Thanks Pedro Braz! A "rationale" way is very difficult because i can't order/rank (i.e. relevance ranking or level of importance) the states (like in Holliste's 2009 study). I tried to fix my transition matrix, setting to highest value the cells between two cells with no real possibility to exist (like the transition from "Died" to "ill in a hospital"). But i'm studying Studer & Richard (2014) "A comparative review of sequence dissimilarity measures". My doubt is that this (arbitrary) procedure will not fulfill the triangle inequality. – Giampiero Feb 20 '15 at 16:04
  • You can define a custom non-symmetric matrix, indeed. However, what do you do with such a matrix? The `seqdist` function of `TraMineR` expects a symmetric substitution cost matrix for OM. If it is not, you will get unpredictable, hence unreliable, results. – Gilbert Feb 20 '15 at 16:47