0

I am looking to count the number of unique rows in a 3D NumPy array. Take the following array:

a = np.array([[[1, 2], [1, 2], [2, 3]], [[2, 3], [2, 3], [3, 4]], [[1, 2], [1, 2], [1, 2]]])

My desired output is a 1-D array of the same length as axis 0 of the 3-D array. array([2, 2, 1]).

In this example, the output would be 2, 2, 1 because in the first grouping [1, 2] and [2, 3] are the unique values, in the second grouping [2, 3] and [3, 4] are the unique values, and in the third grouping [1, 2] is the "unique" values. Perhaps I'm using unique incorrectly in this context but that is what I'm looking to calculate.

The difficulty I'm having is that the count of unique rows will be different. If I use np.unique, the result is broadcast as shown below:

>>> np.unique(a, axis=1)
array([[[1, 2],
        [2, 3]],

       [[2, 3],
        [3, 4]],

       [[1, 2],
        [1, 2]]])

I know I can loop over each a 2D array and use np.apply_along_axis, as described in this answer.

However, I am dealing with arrays as large as (1 000 000, 256, 2), so I would prefer to avoid loops if this is possible.

Eric Truett
  • 2,970
  • 1
  • 16
  • 21
  • Does this answer your question? [Find unique rows in numpy.array](https://stackoverflow.com/questions/16970982/find-unique-rows-in-numpy-array) (or also [this answer](https://stackoverflow.com/a/53928532)) – Jérôme Richard Aug 28 '21 at 22:36
  • 1
    I took a look at those and they work for 2D but not 3D array. I could do an apply or a list comprehension and count the unique rows in the 2D arrays but I'm trying to figure out if there is a faster way of doing so. – Eric Truett Aug 28 '21 at 23:05
  • Indeed. Note the example is unclear to me: isn't the result the number of different unique rows? If not, why the last value is 1? Beside this, is the last dimension always 2 and are the values of `a` always integers and bounded in a specific known bound range? – Jérôme Richard Aug 29 '21 at 10:16
  • They are always integers and the range is between 0 and a variable that is known at the time the counting is done (it would typically be somewhere between 10 and 40). – Eric Truett Aug 29 '21 at 13:44
  • 1
    I also edited the question to try to make it clearer. – Eric Truett Aug 29 '21 at 13:52

1 Answers1

1

Calling np.unique for each 2D plan appear to be extremely slow. Actually, it is np.unique which is slow and not really the pure Python loop.

A better approach is to to that manually with Numba (using a dict). While this strategy is faster, it is not a silver-bullet. However, this implementation can be easily be parallelized to run significantly faster although dict accesses are not very fast. Here is the implementation:

import numpy as np
import numba as nb

@nb.njit('i4[::1](i4[:,:,::1])', parallel=True)
def compute_unique_count(data):
    n,m,o = data.shape
    assert o == 2
    res = np.empty(n, dtype=np.int32)
    for i in nb.prange(n):
        tmp = dict()
        for j in range(m):
            tmp[(data[i, j, 0], data[i, j, 1])] = True
        res[i] = len(tmp)
    return res

Since numbers are small bounded integers, there is an efficient method to make the computation much faster. Indeed, an associative table can be used to flag values already found where the index of the array is the key of the dict and the value of the array define either the key as been found so far. The array can be flatten for sake of performance using the expression id = point[0] * maxi + point[1] where maxi is the bound (all values are assumed to be strictly smaller than it). For sake of performance, branches are avoided and maxi should be a power of two typically >= 64 (due to low-level cache effects like cache line conflicts, cache thrashing and false-sharing). The resulting implementation is very fast.

@nb.njit('int32[::1](int32[:,:,::1])', parallel=True)
def compute_unique_count_fastest(data):
    n,m,o = data.shape
    assert o == 2
    maxi = 64
    threadCount = nb.get_num_threads()
    res = np.empty(n, dtype=np.int32)
    globalUniqueVals = np.zeros((threadCount, maxi * maxi), dtype=np.uint8)
    for i in nb.prange(n):
        threadId = nb.np.ufunc.parallel._get_thread_id()
        uniqueVals = globalUniqueVals[threadId]
        uniqueVals.fill(0) # Reset the associative table
        uniqueCount = 0
        for j in range(m):
            idx = data[i, j, 0] * maxi + data[i, j, 1]
            uniqueCount += uniqueVals[idx] == 0
            uniqueVals[idx] = 1
        res[i] = uniqueCount
    return res

Here are the timings on my machine (i5-9600KF with 6 cores) with an array of size (1_000_000, 256, 2) containing random 32-bit integers from 0 to 40:

np.unique in a comprehension list:  78 000 ms
compute_unique_count:                2 050 ms 
compute_unique_count_fastest:           57 ms

The last implementation is 1370 times faster than the naive implementation.

Jérôme Richard
  • 41,678
  • 6
  • 29
  • 59
  • I guess the naive version would be to use a set, instead a dict. This is also two times faster than the dict approach. But anyway your second version is still much faster... – max9111 Aug 30 '21 at 07:27