How can I remove duplicates from a list of matrices based on some equivalence relation?

Question

Given some list of symmetric integer matrices, I want to remove all duplicates under the following equivalence relation:

Two k x k matrices M1, M2 are equivalent if there is some permutation s on {1,...,k} such that for all i and j in {1,...,k} we have M1_ij = M2_s(i)s(j), i.e. two matrices are "equal" if I can get one from the other by simultaneously permuting its rows and columns.

Unfortunately, my naive approach (while building the list, check if any of the permutations of the new matrix is already in the list) proves to be too slow.

Some probably faster alternative I could think of would be putting all matrices in the list, permute them to some "canonical permutation" and then remove the duplicates as described e.g. here. However, I am not sure how to achieve such a "canonical permutation" in code either.

To narrow this down some more: The matrices are relatively small (k <= 4), the list will contain some 5 or 6 digit number of matrices and the dtype of the matrices has to be some integer type (currently intc, but I could change that).

The order of the final list does not matter, neither does which representative of each equivalence class survives. The entire process may take some small number of hours if needed, but not days.

Is there some reasonably efficient way to achieve this? Did I (yet again) miss some cool NumPy or SciPy facility that would help me with this?

As requested, some small examples to demonstrate how that equivalence relation works:

The matrices {{1,1,1},{1,2,0},{1,0,3}} and {{1,1,1},{1,3,0},{1,0,2}} are equivalent, as the permutation {1,2,3}->{1,3,2} transforms one to the other.

The matrices {{1,1,1},{1,2,0},{1,0,3}} and {{1,1,0},{1,2,1},{0,1,3}} are not equivalent, you cannot change the position of the 1s without permuting the diagonal.

Just to clarify: The permutation of rows and columns is supposed to be the same? — jotasi, Aug 17 '17 at 13:23
@jotasi Yes, we use the same permutation on both the row and the column index for each entry; e.g. every diagonal entry will again become a diagonal entry. — Baum mit Augen, Aug 17 '17 at 13:26
Maybe for those of us who don't have that much mathematical background: Could you show a small example of two equivalent matrices and two not-equivalent matrices (given your equivalence relation). — MSeifert, Aug 17 '17 at 13:32
Just a quick remark: two matrix of the same class must have the same eigenvalues. That can make for a first test. It might be reasonably efficient given the small size of your matrices. — PAb, Aug 17 '17 at 14:08
@PAb Maybe partitioning the list based on eigenvalues would even get me away from that O(n^2) runtime? I'll check that out, thanks. — Baum mit Augen, Aug 17 '17 at 14:16
@PAb Eigenvalues are probably floats, and numerical errors may depend on the order of rows/columns of the matrix. — , Aug 17 '17 at 14:20
Speaking about quick to test necessary conditions, you could start with comparing `np.sort(np.diag(M))`. — jotasi, Aug 17 '17 at 14:21
Why not sort and compare all entries while you are at it `np.all(np.sort(M.ravel()), np.sort(N.ravel()))`?! — Paul Brodersen, Aug 17 '17 at 14:23
@Paul This looks like it does not respect the fact that we can only permute the rows and columns simultaneously. — Baum mit Augen, Aug 17 '17 at 14:25
I was just suggesting another sieve. As with the eigenvalue test, there will be a lot of false positives, of course, but it will cut down the number of pairs that you need to test for real. — Paul Brodersen, Aug 17 '17 at 14:30
The diag test might actually be quite valuable already, in a quick first test, the biggest bucket only contained 10% of the total matrices. Let's see how well the other criteria do. — Baum mit Augen, Aug 17 '17 at 14:42
Also, I am pretty sure that the brute force approach to graph isomorphism is `O(n!)`. How come you get away with `O(n^2)`? It might be a good idea to post your code .... ;-) — Paul Brodersen, Aug 17 '17 at 15:08
@Paul It's O(k!), and k is at most 4, so that's not too ridiculous. :) — Baum mit Augen, Aug 17 '17 at 15:14

score 4 · Answer 1 · answered Aug 17 '17 at 14:23

This is an algebraic answer. I suspect there should be a more satisfying combinatoric answer.

You say that two matrix M and M' are equivalent if there exists a permutation matrix P such that M' = P^{-1} M P.

Let's use the eigen decomposition of M and M' :

M = Q^{-1} D Q

M = Q'^{-1} D' Q'

Where D and D' are diagonal matrices containing the eigenvalues, and Q and Q' are orthogonal matrices.

We can rewrite the equality as:

D = D' up to permutation (i.e. the two matrices should have the same eigenvalues)

Q' = PQ

Testing for the second condition is easy. Given that Q is orthogonal, it amounts to checking whether the matrix dot(Q, Q'.T) is a permutation matrix, i.e. if it has only one "1" per row and column.

A draft for an algorithm is thus:

Take M and M'
Compute the eigen decomposition (Q, D) and (Q', D') of M and M' (using np.linalg.eigh)
If they do not have the same eigenvalues (up to numerical precision of course), they are not equivalent
Else, compute np.dot(Q, Q'.T) and test if it is a permutation matrix

I think that the bottleneck is the eigen decomposition, but you only have to do it once per matrix. Hopefully the first test will discard a lot of matrices quickly.

Hope this helps.

Thanks, I'll try this and the graph one, possible combined with some "bucket solution" to get away from the O(n^2). — Baum mit Augen, Aug 17 '17 at 14:31
Even if you have an approach that solves the equivalence classes within less than O(n^2), at some point you have to compare two matrices, that will always be O(n^2) (unless you say you don't need 100% certanity and will only compare few elements). — Jürg W. Spaak, Aug 18 '17 at 06:45

score 2 · Answer 2 · answered Aug 17 '17 at 13:52

You can think of your matrices as representing the adjacency/weight matrix of a graph, and then test if the two graphs are isomorphic to each other. networkx has a convenient function (and can be installed via pip).

import numpy as np
import networkx as nx
from networkx.algorithms.isomorphism import numerical_edge_match

# create matrices
n = 4
a = np.random.randint(0, 10, size=(n,n))
a = a + a.T # i.e. symmetric
b = np.rot90(a, k=2) # i.e. a rotated by 180 degrees
c = np.ones((n,n), dtype=np.int) # counter-example

# create graphs
ga = nx.from_numpy_matrix(a)
gb = nx.from_numpy_matrix(b)
gc = nx.from_numpy_matrix(c)

# test if isomorphic
print "a isomorphic with b:", nx.is_isomorphic(ga, gb, edge_match=numerical_edge_match('weight', 1)) # True
print "a isomorphic with c:", nx.is_isomorphic(ga, gc, edge_match=numerical_edge_match('weight', 1)) # False

Looks interesting, thanks. Let me try and find out if I can use this to get away from my current O(n^2) approach; if not, I'll have to measure if this is fast enough. — Baum mit Augen, Aug 17 '17 at 14:03
Also have a look at `graph-tool` (https://graph-tool.skewed.de/) and `igraph`, which may be faster by virtue of of the fact that their core functions are written in C and C++. — Paul Brodersen, Aug 17 '17 at 14:42

Jürg W. Spaak · Answer 3 · 2017-08-18T06:31:36.477

-1

Just use your canonical approach. Search for the largest entry in the matrix, put it on the top right corner. Sort then the first colum and first row according it's entries.

A = np.array([[1,2,3,5],
     [3,6,2,6],
     [3,5,7,2],
     [1,3,6,3]])
a = np.where(A == np.amax(A))
sort_colums = np.argsort(A[a[0]].ravel())[::-1]
sort_rows = np.argsort(A[:,a[1]].ravel())[::-1]
Col_sorted = A[:,sort_colums]
Equiv_class = Col_sorted[sort_rows]
#returns [[7, 5, 3, 2],
          [6, 3, 1, 3],
          [3, 2, 1, 5],
          [2, 6, 3, 6]]

As pointed out in the comments, this only works, when the entries of the matrix occur only once. In case the do occur several times, but not that often, then one could adapt this method by generating several equivalence classes.

edited Aug 18 '17 at 06:31

answered Aug 17 '17 at 13:38

Jürg W. Spaak

2,057
1
15
34

There need not be a single largest element in the matrix, I believe that does matter, doesn't it? (Not familiar with those functions yet, though, I'll check it out.) – Baum mit Augen Aug 17 '17 at 13:42
1

This does indeed not appear to work for matrices like `[[1,1,1],[1,1,0],[1,0,1]]`. "TypeError: only length-1 arrays can be converted to Python scalars" – Baum mit Augen Aug 17 '17 at 13:56
1

@BaummitAugen Indeed, it only works for matrices that have one specific maximal value. However for these matrices the algorithm is O(n log(n)) to convert the matrix into the equivalence class. I updated the code (I made a small mistake) – Jürg W. Spaak Aug 18 '17 at 06:18

How can I remove duplicates from a list of matrices based on some equivalence relation?

3 Answers3