Trying to put matrices into a set using python, but it still allows duplicates

Question

I have a simple piece of code which doesn't run as expected.

from numpy import *
from numpy.linalg import *
from sets import Set

W = matrix('1, 1, 1, 1; 1, 1, -1, -1; 1, -1, 2, -2; 1, -1, -2, 2')
E = matrix('1, 1, 1, 1; 1, 1, -1, -1; 1, -1, 2, -2; 1, -1, -2, 2')

matrices = Set([])
matrices.add(W)
matrices.add(E)
matrices

The matrices are identical, however they both appear seperately when I print the contents of the set. However, if I assign it like below, then the duplicate does not appear.

W = matrix('1, 1, 1, 1; 1, 1, -1, -1; 1, -1, 2, -2; 1, -1, -2, 2')
E = W

Any idea what is happening? I need a way of avoiding duplicate matrices in a program I am writing, which generates a tonne of matrices.

EDIT: I want the following output

set([matrix([[ 1,  1,  1,  1],
        [ 1,  1, -1, -1],
        [ 1, -1,  2, -2],
        [ 1, -1, -2,  2]])])

but instead get the following:

set([matrix([[ 1,  1,  1,  1],
        [ 1,  1, -1, -1],
        [ 1, -1,  2, -2],
        [ 1, -1, -2,  2]]), matrix([[ 1,  1,  1,  1],
        [ 1,  1, -1, -1],
        [ 1, -1,  2, -2],
        [ 1, -1, -2,  2]])])

Possible duplicate of [How does a Python set([]) check if two objects are equal? What methods does an object need to define to customise this?](http://stackoverflow.com/questions/3942303/how-does-a-python-set-check-if-two-objects-are-equal-what-methods-does-an-o). — Andreas Florath, Jan 08 '13 at 21:17
Possible duplicate of "Constructing a python set from a numpy matrix" - http://stackoverflow.com/questions/1939228/constructing-a-python-set-from-a-numpy-matrix — forivall, Jan 08 '13 at 21:24

score 3 · Answer 1 · answered Jan 08 '13 at 21:18

It happens because sets use __eq__ and __hash__ special methods to detect equality of items (see http://docs.python.org/2/library/sets.html). But matrix objects have different hashes and those __eq__ method doesn't return true/false, but matrix instead:

>>> W == E
matrix([[ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True],
        [ True,  True,  True,  True]], dtype=bool)
>>> W > E
matrix([[False, False, False, False],
        [False, False, False, False],
        [False, False, False, False],
        [False, False, False, False]], dtype=bool)

Wilduck · Accepted Answer · 2013-01-08T21:34:53.413

You're running into issues with how python implements checking for similarity between objects internally. Specifically, how objects considered "hashable" are compared.

The way that the python set constructor decides if two objects are the same is based on calling a magic method called __hash__ (and another called __eq__). Two objects are considered the same if the result of calling __hash__ on them returns the same value (and caling __eq__ on them returns True). If calling __hash__ on the two objects gives different values, set assumes they cannot be considered the same.

It is also worth noting that sets can only contain objects that are considered "hashable", that is, those objects which implement the __hash__ method.

Lets see how this works:

In [73]: a = "one"
In [74]: b = "one"
In [75]: c = "two"

In [76]: a.__hash__()
Out[76]: -261223665

In [77]: b.__hash__()
Out[77]: -261223665

In [78]: c.__hash__()
Out[78]: 323309869

In [79]: set([a,b,c])
Out[79]: set(['two', 'one'])

Now, lets import numpy, and see what the hash values are for your matrices.

In [81]: import numpy as np
In [82]: W = np.matrix('1, 1, 1, 1; 1, 1, -1, -1; 1, -1, 2, -2; 1, -1, -2, 2')
In [83]: E = np.matrix('1, 1, 1, 1; 1, 1, -1, -1; 1, -1, 2, -2; 1, -1, -2, 2')

In [84]: W.__hash__()
Out[84]: 4879307

In [85]: E.__hash__()
Out[85]: 4879135

Notice that the hashes are different for E and W even though they seem to contain the same thing. Since their hashes are different, they're going to show up as different objects in the set. When you do assignment like W = E, then the names W and E are actually referring to the same object.

If you need a workaround for this, you could store the strings you're using to build the matrices:

In [86]: set(['1, 1, 1, 1; 1, 1, -1, -1; 1, -1, 2, -2; 1, -1, -2, 2',
              '1, 1, 1, 1; 1, 1, -1, -1; 1, -1, 2, -2; 1, -1, -2, 2'])
Out[86]: set(['1, 1, 1, 1; 1, 1, -1, -1; 1, -1, 2, -2; 1, -1, -2, 2'])

Not quite 100% accurate. Sets use `__eq__`, but assume `__hash__` is usable as a fast approximation; if the hashes are different, the objects cannot be equal, but if the hashes match, then the objects only *might* be equal. Full equality must then be checked. — Ben, Jan 08 '13 at 21:30
Thanks @Ben, I left it out originally because it isn't directly related to the problem, but I've added a couple parentheticals to my answer for completeness's sake. — Wilduck, Jan 08 '13 at 21:36
Thanks, still new to python. I ended up storing the string representations of the matrices into sets and it works. — user1429039, Jan 10 '13 at 03:10

score 2 · Answer 3 · answered Jan 08 '13 at 21:48

matrix doesn't have very well behaved __eq__ and __hash__ methods when it comes to using them in set. If you want to use a set to make them unique, you need to wrap the matrix in a helper class. Something simple like this should do;

import hashlib

class MatrixWrap:
     def __init__(self, matrix):
         self.matrix = matrix
     def __hash__(self):
         return int(hashlib.sha1(self.matrix).hexdigest(), 16)
     def __eq__(self, x):
         return self.__hash__() == x.__hash__()

Then you can just do;

from numpy import *
from numpy.linalg import *

W = matrix('1, 1, 1, 1; 1, 1, -1, -1; 1, -1, 2, -2; 1, -1, -2, 2')
E = matrix('1, 1, 1, 1; 1, 1, -1, -1; 1, -1, 2, -2; 1, -1, -2, 2')
X = matrix('2, 1, 1, 1; 1, 1, -1, -1; 1, -1, 2, -2; 1, -1, -2, 2')

matrices = set()
matrices.add(MatrixWrap(W))
matrices.add(MatrixWrap(E))
matrices.add(MatrixWrap(X))

for a in matrices:
    print a.matrix

...to get your unique matrices listed.

dhj · Answer 4 · 2013-01-09T15:20:06.160

All of the answers and comments have been good and identified the problem and @Joachim Isaksson 's identified a good solution. I wanted to point out that you can also serialize a regular array and dump/load the data into the set like this:

import numpy as np

def arrayToTuple(arr):
    arrType = arr.dtype.str
    arrShape = arr.shape
    arrData = arr.tostring()

    return (arrType,arrShape,arrData)

def tupleToArray(tupl):
    arrType, arrShape, arrData = tupl

    return np.matrix( np.fromstring(arrData, dtype=arrType).reshape(arrShape) )
        # remove the matrix( ) wrap to return arrays instead of matrices

Then your code would look like this:

W = matrix('1, 1, 1, 1; 1, 1, -1, -1; 1, -1, 2, -2; 1, -1, -2, 2')
E = matrix('1, 1, 1, 1; 1, 1, -1, -1; 1, -1, 2, -2; 1, -1, -2, 2')

matrixTuples = set()

matrixTuples.add(arrayToTuple(W))
matrixTuples.add(arrayToTuple(E))

for mTupl in matrixTuples:
    print tupleToArray(mTupl)

This will also work with regular bool, integer and float arrays (but not object or string arrays) -- just remove the matrix( ) wrapper on the arrayFromTuple return. I guess those functions may be better named matrixToTuple and tupleToMatrix but they are very close regardless of whether you are working with matrices or arrays.

Trying to put matrices into a set using python, but it still allows duplicates

4 Answers4