Count number of distinct columns in a matrix

Question

I have some simple code and I just want to count the number of distinct columns in the product of two matrices. The code is

import numpy as np
import itertools

n = 5
h = 2
M =  np.random.randint(2, size=(h,n))
F = np.matrix(list(itertools.product([0,1],repeat = 5))).transpose()
product = M*F
setofcols = set()
for column in product.T:
    setofcols.add(column)
print len(setofcols)

However this gives the wrong answer as all the elements of setofcols are different even if the columns were the same. What is the right thing to do?

I will be running this with bigger values of n and h so a solution that uses as little memory as possible would be great.

Related: http://stackoverflow.com/questions/12983067/how-to-find-unique-vectors-of-a-2d-array-over-a-particular-axis-in-a-vectorized — Warren Weckesser, Dec 11 '13 at 21:14
@alko I would have to change my code to use arrays for this. I tried F = np.array(list(itertools.product([0,1],repeat = 5))).transpose() but then I get TypeError: unhashable type: 'numpy.ndarray' — marshall, Dec 11 '13 at 21:19
@marshall why? you can use solution in the thread I provided for matrices as well. or if case you doubt, use np.array(F*M) — alko, Dec 11 '13 at 21:23
@dawg without specific seed? I got 6 with `len(set(map(repr, (M*F).T)))` — alko, Dec 11 '13 at 21:25
@alko Oh I see, thanks. dawg's solution works for me as well. — marshall, Dec 11 '13 at 21:26

dawg · Accepted Answer · 2013-12-11T21:56:48.073

You can make yours work by using repr:

import numpy as np
import itertools

n = 5
h = 2
M =  np.random.randint(2, size=(h,n))
F = np.matrix(list(itertools.product([0,1],repeat = 5))).transpose()
product = M*F
setofcols = set()
for column in product.T:
    setofcols.add(repr(column))
print len(setofcols)
print setofcols

You can also do this:

setofcols={tuple(e.A1) for e in product.T}

Where the A1 attribute of a matrix is the 1d base array which can be used as a sequence for tuple.

Count number of distinct columns in a matrix

1 Answers1