6

Suppose there are three sequences to be compared: a, b, and c. Traditionally, the resulting 3-by-3 pairwise distance matrix is symmetric, indicating that the distance from a to b is equal to the distance from b to a.

I am wondering if TraMineR provides some way to produce an asymmetric pairwise distance matrix.

apaderno
  • 28,547
  • 16
  • 75
  • 90
POTENZA
  • 1,377
  • 3
  • 17
  • 20
  • 5
    I've never used TraMineR, but a word of caution on a side issue - if your measure is asymmetric, then it no longer fits the definition of a distance. That may be an entirely academic point. But I suspect you're going to use this matrix in some algorithm later, and if that algorithm assumes you've supplied it a distance metric when in fact you have not, badness may result in a way that's hard to diagnose. –  Feb 07 '13 at 09:09
  • 2
    Questions that relate to how to do something in a particular software usually belong on StackOverflow, so I marked this question for migration. However, TraMineR also has its own list, which might be an even better site to ask this on. – Peter Flom Feb 07 '13 at 11:25

1 Answers1

7

No, TraMineR does not produce 'assymetric' dissimilaries precisely for the reasons stressed in Pat's comment.

The main interest of computing pairwise dissimilarities between sequences is that once we have such dissimilarities we can for instance

  • measure the discrepancy among sequences, determine neighborhoods, find medoids, ...
  • run cluster algorithms, self-organizing maps, MDS, ...
  • make ANOVA-like analysis of the sequences
  • grow regression trees for the sequences

Inputting a non symmetric dissimilarity matrix in those processes would most probably generate irrelevant outcomes.

It is because of this symmetry requirement that the substitution costs used for computing Optimal Matching distances MUST be symmetrical. It is important to not interpret substitution costs as the cost of switching from one state to the other, but to understand them for what they are, i.e., edit costs. When comparing two sequences, for example aabcc and aadcc, we can make them equal either by replacing arbitrarily b with d in the first one or d with b in the second one. It would then not make sense not giving the same cost for the two substitutions.

Hope this helps.

Gilbert
  • 3,570
  • 18
  • 28
  • Thank you very much for your great explanation! If my understanding is correct, however, I find a paper in which an asymmetric pairwise distance matrix is made and then it is used for clustering analysis. They somehow assign different weights for insertion and deletion, and use Taylor-Butina clustering algorithm with the asymmetric distance matrix. The paper is "Incorporating sequential information into traditional classification models by using an element/position-sensitive SAM" written by Anita Prinzie and Dirk Van den Poel. – POTENZA Feb 07 '13 at 21:41