6

Is there a better way to count how many times a given row appears in a numpy 2D array than

def get_count(array_2d, row):
    count = 0
    # iterate over rows, compare
    for r in array_2d[:,]:
        if np.equal(r, row).all():
            count += 1
    return count    

# let's make sure it works

array_2d = np.array([[1,2], [3,4]])
row = np.array([1,2])       

count = get_count(array_2d, row)
assert(count == 1)
Nucular
  • 683
  • 1
  • 5
  • 19
  • 1
    If this code works, it should be on Code Review; not here. – Carcigenicate Jul 31 '16 at 18:47
  • Also related: [Count how many times each row is present in numpy.array](http://stackoverflow.com/questions/27000092/count-how-many-times-each-row-is-present-in-numpy-array) – Alex Riley Jul 31 '16 at 19:10
  • 1
    @Carcigenicate, questions like this that (implicitly) ask for ways to replace loops with faster numpy methods are quite common on SO. It's very much a 'how to' kind of question. These questions do get asked on CR, but that forum is pickier as to presentation, and the `numpy` community is much smaller there. CR is better for code style review. I like working code on SO, it makes it easier to test my answer. – hpaulj Jul 31 '16 at 19:39

1 Answers1

3

One simple way would be with broadcasting -

(array_2d == row).all(-1).sum()

Considering memory efficiency, here's one approach considering each row from array_2d as an indexing tuple on an n-dimensional grid and assuming positive numbers in the inputs -

dims = np.maximum(array_2d.max(0),row) + 1
array_1d = np.ravel_multi_index(array_2d.T,dims)
row_scalar = np.ravel_multi_index(row,dims)
count = (array_1d==row_scalar).sum()

Here's a post discussing the various aspects related to it.

Note: Using np.count_nonzero could be much faster to count booleans instead of summation with .sum(). So, do consider using it for both the above mentioned aproaches.

Here's a quick runtime test -

In [74]: arr = np.random.rand(10000)>0.5

In [75]: %timeit arr.sum()
10000 loops, best of 3: 29.6 µs per loop

In [76]: %timeit np.count_nonzero(arr)
1000000 loops, best of 3: 1.21 µs per loop
Community
  • 1
  • 1
Divakar
  • 218,885
  • 19
  • 262
  • 358
  • `(array_2d == row).all(-1).sum()` is _exactly_ what I was looking for. Wasn't aware of `all()` params. – Nucular Aug 01 '16 at 11:25