2

I wish to initiate a symmetric matrix in python and populate it with zeros.

At the moment, I have initiated an array of known dimensions but this is unsuitable for subsequent input into R as a distance matrix.

Are there any 'simple' methods in numpy to create a symmetric matrix?

Edit

I should clarify - creating the 'symmetric' matrix is fine. However I am interested in only generating the lower triangular form, ie.,

ar = numpy.zeros((3, 3))

array([[ 0.,  0.,  0.],
       [ 0.,  0.,  0.],
       [ 0.,  0.,  0.]])

I want:

array([[ 0],
       [ 0, 0 ],
       [ 0.,  0.,  0.]])

Is this possible?

Seanny123
  • 8,776
  • 13
  • 68
  • 124
Darren J. Fitzpatrick
  • 7,159
  • 14
  • 45
  • 49
  • Could you elaborate more what you mean by `unsuitable for subsequent input into R as a distance matrix`. You haven't tagged `R`, so is it relevant at all? Thanks – eat Jan 31 '11 at 21:40
  • Sorry - No, that is not really relevant. Distances between items (euclidean, manhattan, cosine, etc.) are computed in a pairwise fashion, hence a symmetric output. I am calculating custom matrices as R doen not handle my data very well. That is another story though. Apologies for the confusion! – Darren J. Fitzpatrick Jan 31 '11 at 21:45
  • 1
    @Darren: Still unsure what you are looking for. Are you perhaps suggesting that you calculate only for example upper tridiagonal part and somehow magically lower triadiagonal part would reflect that? Even doable I would expect performance penalties. Do you care to show any code what you have? – eat Jan 31 '11 at 21:56
  • I think EOL's answer to a similar Q is right on target (http://stackoverflow.com/questions/2572916/numpy-smart-symmetric-matrix) – doug Jan 31 '11 at 21:58
  • 1
    If you're wanting a fast distance matrix calculation, take a look at `scipy.spatial.pdist` http://docs.scipy.org/doc/scipy/reference/generated/scipy.spatial.distance.pdist.html#scipy.spatial.distance.pdist – Joe Kington Jan 31 '11 at 22:30
  • @Joe: +1, indeed these are really fast implementations. – eat Jan 31 '11 at 23:01

1 Answers1

3

I don't think it's feasible to try work with that kind of triangular arrays.

So here is for example a straightforward implementation of (squared) pairwise Euclidean distances:

def pdista(X):
    """Squared pairwise distances between all columns of X."""
    B= np.dot(X.T, X)
    q= np.diag(B)[:, None]
    return q+ q.T- 2* B

For performance wise it's hard to beat it (in Python level). What would be the main advantage of not using this approach?

eat
  • 7,440
  • 1
  • 19
  • 27