0

I'm looking for an extension to this question Efficiently counting number of unique elements - NumPy / Python that can also return the count of each unique element (i.e. how many times it occurs in the array).

import numpy as np
max_a = 20_000
a = np.random.randint(max_a, size=10_000).astype(np.int32)

# Using np.unique
u,counts = np.unique(a,return_counts=True)

# Faster alternative suggested in the post above
q = np.zeros(max_a, dtype=int)
q[a] = 1
v = np.nonzero(q)[0]

I can verify that u and v are the same, and the faster method using q is definitely faster. However, I also want the counts that are returned by the np.unique() call. Is there a way to modify the example here to obtain those?

NB the elements of a will always be of type np.int32 so they can be used for indexing.

robyna
  • 13
  • 2

1 Answers1

0

In the case where a is of type uint, you can use np.bincount which offers a fast way to get the counts. If you do have negative int's, perhaps you can add an offset to make all positive?

For example:

counts = np.bincount(a) # includes 0-counts
unique = np.flatnonzero(counts)
counts = counts[unique] # remove 0-counts
Rutger Kassies
  • 61,630
  • 17
  • 112
  • 97
  • 1
    This works! Here's a comparison of timing: ``` def np_unique(a): return np.unique(a, return_counts=True) def fast_unique_counts(a): counts = np.bincount(a) u = np.flatnonzero(counts) return u, counts[u] def fast_unique(a, max_a): q = np.zeros(max_a, dtype=int) q[a.ravel()] = 1 return np.nonzero(q)[0] %timeit u1, counts1 = np_unique(a) 674 µs ± 29.4 µs per loop %timeit u2, counts2 = fast_unique_counts(a) 318 µs ± 123 µs per loop %timeit u3 = fast_unique(a, max_a) 293 µs ± 6.36 µs per loop ``` – robyna May 25 '22 at 02:17