Efficiently get unique elements + count - NumPy / Python

Question

I'm looking for an extension to this question Efficiently counting number of unique elements - NumPy / Python that can also return the count of each unique element (i.e. how many times it occurs in the array).

import numpy as np
max_a = 20_000
a = np.random.randint(max_a, size=10_000).astype(np.int32)

# Using np.unique
u,counts = np.unique(a,return_counts=True)

# Faster alternative suggested in the post above
q = np.zeros(max_a, dtype=int)
q[a] = 1
v = np.nonzero(q)[0]

I can verify that u and v are the same, and the faster method using q is definitely faster. However, I also want the counts that are returned by the np.unique() call. Is there a way to modify the example here to obtain those?

NB the elements of a will always be of type np.int32 so they can be used for indexing.

Is the faster method still faster if you don't return `counts`? — Giovanni Tardini, May 24 '22 at 09:18
Did you read all the answers in the provided link? `bincount` or Numba should do the job pretty well in your case. — Jérôme Richard, May 24 '22 at 09:55

score 0 · Accepted Answer · answered May 24 '22 at 09:54

0

In the case where a is of type uint, you can use np.bincount which offers a fast way to get the counts. If you do have negative int's, perhaps you can add an offset to make all positive?

For example:

counts = np.bincount(a) # includes 0-counts
unique = np.flatnonzero(counts)
counts = counts[unique] # remove 0-counts

answered May 24 '22 at 09:54

Rutger Kassies

61,630
17
112
97

1

This works! Here's a comparison of timing: ``` def np_unique(a): return np.unique(a, return_counts=True) def fast_unique_counts(a): counts = np.bincount(a) u = np.flatnonzero(counts) return u, counts[u] def fast_unique(a, max_a): q = np.zeros(max_a, dtype=int) q[a.ravel()] = 1 return np.nonzero(q)[0] %timeit u1, counts1 = np_unique(a) 674 µs ± 29.4 µs per loop %timeit u2, counts2 = fast_unique_counts(a) 318 µs ± 123 µs per loop %timeit u3 = fast_unique(a, max_a) 293 µs ± 6.36 µs per loop ``` – robyna May 25 '22 at 02:17

Efficiently get unique elements + count - NumPy / Python

1 Answers1