5

I want to count the number of equal matrices that I encounter after splitting a large matrix.

mat1 = np.zeros((4, 8))

split4x4 = np.split(mat1, 4)

Now I want to know how many equal matrices are in split4x4, but collections.Counter(split4x4) throws an error. Is there a built-in way in numpy to do this?

andandandand
  • 21,946
  • 60
  • 170
  • 271
  • i am an amateur so this may sound silly, but np.split() will by default split the array in equal pieces that you specify (for eg: 4 in above example) and if it can't than it throws an error. So, why do you need to find out that information, wouldn't that be just 4? – Siraj S. Aug 23 '16 at 18:54

2 Answers2

1

This can be done in a fully vectorized manner using the numpy_indexed package (disclaimer: I am its author):

import numpy_indexed as npi
unique_rows, row_counts = npi.count(mat1)

This should be substantially faster than using collections.Counter.

Eelco Hoogendoorn
  • 10,459
  • 1
  • 44
  • 42
1

Maybe the easiest way is to use np.unique and to flatten the split arrays to compare them as tuple:

import numpy as np
# Generate some sample data:
a = np.random.uniform(size=(8,3))
# With repetition:
a = np.r_[a,a]
# Split a in 4 arrays
s = np.asarray(np.split(a, 4))
s = [tuple(e.flatten()) for e in s]
np.unique(s, return_counts=True)

Remark: return_counts argument of np.unique new in version 1.9.0.

An other pure numpy solution inspired from that post

# Generate some sample data:
In: a = np.random.uniform(size=(8,3))
# With some repetition
In: a = r_[a,a]
In: a.shape
Out: (16,3)
# Split a in 4 arrays
In: s = np.asarray(np.split(a, 4))
In: print s
Out: [[[ 0.78284847  0.28883662  0.53369866]
       [ 0.48249722  0.02922249  0.0355066 ]
       [ 0.05346797  0.35640319  0.91879326]
       [ 0.1645498   0.15131476  0.1717498 ]]

      [[ 0.98696629  0.8102581   0.84696276]
       [ 0.12612661  0.45144896  0.34802173]
       [ 0.33667377  0.79371788  0.81511075]
      [ 0.81892789  0.41917167  0.81450135]]

      [[ 0.78284847  0.28883662  0.53369866]
       [ 0.48249722  0.02922249  0.0355066 ]
       [ 0.05346797  0.35640319  0.91879326]
       [ 0.1645498   0.15131476  0.1717498 ]]

      [[ 0.98696629  0.8102581   0.84696276]
       [ 0.12612661  0.45144896  0.34802173]
       [ 0.33667377  0.79371788  0.81511075]
       [ 0.81892789  0.41917167  0.81450135]]]
In: s.shape
Out: (4, 4, 3)
# Flatten the array:
In: s = asarray([e.flatten() for e in s])
In: s.shape
Out: (4, 12)
# Sort the rows using lexsort:
In: idx = np.lexsort(s.T)
In: s_sorted = s[idx]
# Create a mask to get unique rows
In: row_mask = np.append([True],np.any(np.diff(s_sorted,axis=0),1))
# Get unique rows:
In: out = s_sorted[row_mask]
# and count:
In: for e in out:
        count = (e == s).all(axis=1).sum()
        print e.reshape(4,3), count
Out:[[ 0.78284847  0.28883662  0.53369866]
     [ 0.48249722  0.02922249  0.0355066 ]
     [ 0.05346797  0.35640319  0.91879326]
     [ 0.1645498   0.15131476  0.1717498 ]] 2
    [[ 0.98696629  0.8102581   0.84696276]
     [ 0.12612661  0.45144896  0.34802173]
     [ 0.33667377  0.79371788  0.81511075]
     [ 0.81892789  0.41917167  0.81450135]] 2
Community
  • 1
  • 1
bougui
  • 3,507
  • 4
  • 22
  • 27
  • are you using python 3 in the first example? Cause I get from `a = r_[a,a]` `NameError: name 'r_' is not defined` – andandandand Aug 25 '16 at 19:48
  • @andandandand No I don't. It's my fault, I forgot the `np` just before `r_` which is a numpy simple way to build up arrays quickly (see: http://docs.scipy.org/doc/numpy/reference/generated/numpy.r_.html). I've just corrected my answer. – bougui Aug 26 '16 at 08:49