Efficient way to count unique elements in array in numpy/scipy in Python

Question

I have a scipy array, e.g.

a = array([[0, 0, 1], [1, 1, 1], [1, 1, 1], [1, 0, 1]])

I want to count the number of occurrences of each unique element in the array. For example, for the above array a, I want to get out that there is 1 occurrence of [0, 0, 1], 2 occurrences of [1, 1, 1] and 1 occurrence of [1, 0, 1].

One way I thought of doing it is:

from collections import defaultdict
d = defaultdict(int)

for elt in a:
  d[elt] += 1

is there a better/more efficient way?

thanks.

Where is the usage of Numpy / Scipy in your example code? Or is this only supposed to get the idea across, wanting to have a Numpy / Scipy function to solve this? — Zelphir Kaltstahl, Apr 21 '16 at 15:35

Horst Gutmann · Accepted Answer · 2010-10-27T20:31:11.363

8

If sticking with Python 2.7 (or 3.1) is not an issue and any of these two Python versions is available to you, perhaps the new collections.Counter might be something for you if you stick to hashable elements like tuples:

>>> from collections import Counter
>>> c = Counter([(0,0,1), (1,1,1), (1,1,1), (1,0,1)])
>>> c
Counter({(1, 1, 1): 2, (0, 0, 1): 1, (1, 0, 1): 1})

I haven't done any performance testing on these two approaches, though.

edited Oct 27 '10 at 20:31

answered Oct 27 '10 at 20:24

Horst Gutmann

10,910
2
28
31

5

defaultdict will be faster. John Machin showed this with timings in an answer earlier today (http://stackoverflow.com/questions/4036474/add-new-keys-to-a-dictionary-while-incrementing-existing-values). – Steven Rumbalski Oct 27 '10 at 21:48
1

Doesn't use Numpy / Scipy though, as requested by the title of the OP. Also advocates usage of outdated versions of Python. Not sure this is a good answer. – Zelphir Kaltstahl Apr 21 '16 at 15:36

chuck · Answer 2 · 2010-10-30T05:37:25.197

You can sort the array lexicographically by rows and the look for points where the rows change:

In [1]: a = array([[0, 0, 1], [1, 1, 1], [1, 1, 1], [1, 0, 1]])

In [2]: b = a[lexsort(a.T)]

In [3]: b
Out[3]: 
array([[0, 0, 1],
       [1, 0, 1],
       [1, 1, 1],
       [1, 1, 1]])

...


In [5]: (b[1:] - b[:-1]).any(-1)
Out[5]: array([ True,  True, False], dtype=bool)

The last array says that the first three rows differ and the third row is repeated twice.

For arrays of ones and zeros you can encode the values:

In [6]: bincount(dot(a, array([4,2,1])))
Out[6]: array([0, 1, 0, 0, 0, 1, 0, 2])

Dictionaries can also be used. Which of the various methods will be fastest will depend on the sort of arrays you are actually working with.

score 1 · Answer 3 · answered Oct 27 '10 at 21:01

1

for python 2.6 <

import itertools

data_array = [[0, 0, 1], [1, 1, 1], [1, 1, 1], [1, 0, 1]]

dict_ = {}

for list_, count in itertools.groupby(data_array):
    dict_.update({tuple(list_), len(list(count))})

answered Oct 27 '10 at 21:01

mouad

67,571
18
114
106

Eelco Hoogendoorn · Answer 4 · 2016-04-02T20:46:26.140

0

The numpy_indexed package (disclaimer: I am its author) provides a solution similar to the one posted by chuck; which is a nicely vectorized one. But with tests, a nice interface, and many more related useful functions:

import numpy_indexed as npi
npi.count(a)

edited Apr 02 '16 at 20:46

answered Apr 02 '16 at 15:06

Eelco Hoogendoorn

10,459
1
44
42

Efficient way to count unique elements in array in numpy/scipy in Python

4 Answers4

Linked