What is the difference between condensed and redundant distance matrices?

Question

New to python and programming in general:

The documentation to squareform states the following:

Converts a vector-form distance vector to a square-form distance matrix, and vice-versa.

Converts a 1D array into a squared matrix?

Where the paramenter X:

Either a condensed or redundant distance matrix.

and returns:

If a condensed distance matrix is passed, a redundant one is returned, or if a redundant one is passed, a condensed distance matrix is returned.

what is the difference between condensed and redundant matrices?
what is the relationship between condensed/redundant matrix and vector/square form in which it takes?

The return of pdist papers to return condensed distance matrix:

Returns a condensed distance matrix Y. For each i and j (where i is less than j is less than n), the metric dist(u=X[i], v=X[j]) is computed and stored in entry ij.

Am I right in thinking that in each element Y stores the distance between a particular point and an other point? An example with 3 observations would mean a condensed matrix with 9 elements?

Will, does http://stackoverflow.com/questions/13079563/how-does-condensed-distance-matrix-work-pdist look like a duplicate of your question? — Warren Weckesser, Apr 21 '16 at 20:37
@WarrenWeckesser related but different, stackoverflow.com/questions/13079563/ it take the terms I question for granted and so begs the question? Unless I am missing something. — , Apr 21 '16 at 20:56
When we say: "If y is a 1d condensed distance matrix, then y must be a (n 2) sized vector where n is the number of original observations paired in the distance matrix." what does (n 2) means? — akshit bhatia, Jan 06 '19 at 08:12

score 2 · Accepted Answer · answered Apr 21 '16 at 20:59

2

If you have a nxn matrix then each pairwise combination from the set N exists twice, once in each order, ab and ba. So if you create a distance matrix from a set of N points you can condense the data by only storing each point once, and neglecting any comparisons between points and themselves.

for example if we have the points a, b, and c we would have the distance matrix

    a    b    c
a   0    ab   ac
b   ba   0    bc
c   ca   cb   0

and the condensed distance matrix,

    a    b    c
         ab   ac
              bc

Because distance masers are unsigned the condensed table retains all the information.

answered Apr 21 '16 at 20:59

kpie

9,588
5
28
50

@kkpie "distance masers are unsigned" What does this mean? – Apr 21 '16 at 21:22
1

the vector from a to b is anti parallel and equal in magnitude to the vector from b to a. This means that they have the opposite sign and same measure or magnitude in the 1 dimensional vector space defined by their direction. – kpie Apr 22 '16 at 09:17
1

@Will the distance between 2 things is a measure that neglects the order of the pair. – kpie Apr 22 '16 at 09:19
under what conditions is a distance matrix square? what meaning does a condensed distance matrix convey if the raw distance matrix is mxn (eg, output of `euclidean.distances(df1,df2)` where df1 and df2 have the same number of columns but different number of rows/observations)? – 3pitt Jan 11 '18 at 15:53

What is the difference between condensed and redundant distance matrices?

1 Answers1