How do I count the occurrence of a certain item in an ndarray?

Question

How do I count the number of 0s and 1s in the following array?

y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])

y.count(0) gives:

numpy.ndarray object has no attribute count

In this case, it is also possible to simply use `numpy.count_nonzero`. — Mong H. Ng, Mar 31 '19 at 17:50

score 1032 · Accepted Answer · edited Jun 20 '22 at 02:17

1032

Using numpy.unique:

import numpy
a = numpy.array([0, 3, 0, 1, 0, 1, 2, 1, 0, 0, 0, 0, 1, 3, 4])
unique, counts = numpy.unique(a, return_counts=True)

>>> dict(zip(unique, counts))
{0: 7, 1: 4, 2: 1, 3: 2, 4: 1}

Non-numpy method using collections.Counter;

import collections, numpy
a = numpy.array([0, 3, 0, 1, 0, 1, 2, 1, 0, 0, 0, 0, 1, 3, 4])
counter = collections.Counter(a)

>>> counter
Counter({0: 7, 1: 4, 3: 2, 2: 1, 4: 1})

edited Jun 20 '22 at 02:17

Mateen Ulhaq

24,552
19
101
135

answered Feb 22 '15 at 22:10

Ozgur Vatansever

49,246
17
84
119

3

What if I want to access the number of occurences of each unique elements of the array without assigning to the variable - counts. Any hints on that ? – sajis997 Dec 24 '16 at 23:08
1

I have the same goal as @sajis997. I want to use 'count' as an aggregating function in a groupby – p_sutherland Mar 15 '18 at 16:34
1

@sajis997 if you do a groupby on the desired level of aggregation and use np.count_nonzero as the aggregate function it will return the number of occurrences of a each unique value – p_sutherland Mar 15 '18 at 16:52
1

Does this do anything weird if the array contains `NaN`? – jpmc26 May 12 '18 at 06:05
7

this is a hack. Numpy has functions for this called bincount() or histogram() – john k Oct 22 '18 at 03:01
1

for reference, you can use np.size in groupby: df.groupby('client')['value'].agg([np.size, np.mean]) # describe already does that, but if you need to customize you can use in this way – J. Ceron Jul 22 '19 at 09:54
6

Tried using both methods for a very large array (~30Gb). Numpy method ran out of memory whereas the `collections.Counter` worked just fine – Ivan Novikov Nov 26 '19 at 13:10
2

For those wondering, this answer works for any type of np array (e.g. it works for floats), unlike some of the answers provided. – waykiki Aug 10 '21 at 10:27
1

Worth mentioning that this solution is much faster than pd.DataFrame(x).value_counts() – RunTheGauntlet Jan 19 '22 at 09:10
1

This does not count occurrences of a "certain item" – user3329732 Jun 16 '22 at 06:01

Aziz Alto · Answer 2 · 2017-10-12T14:33:02.540

439

What about using numpy.count_nonzero, something like

>>> import numpy as np
>>> y = np.array([1, 2, 2, 2, 2, 0, 2, 3, 3, 3, 0, 0, 2, 2, 0])

>>> np.count_nonzero(y == 1)
1
>>> np.count_nonzero(y == 2)
7
>>> np.count_nonzero(y == 3)
3

edited Oct 12 '17 at 14:33

answered Feb 22 '16 at 09:14

Aziz Alto

19,057
5
77
60

55

This answer seems better than the one with the most upvotes. – Alex Dec 31 '17 at 17:16
3

I don't think this would work for `numpy.ndarray` as OP originally asked. – LYu Jul 28 '18 at 20:55
10

@LYu - the y is an np.ndarray in this answer. Also - most if not all np.something functions work on ndarrays without problem. – mmagnuski Jul 29 '18 at 19:34

score 193 · Answer 3 · answered May 05 '16 at 20:51

193

Personally, I'd go for: (y == 0).sum() and (y == 1).sum()

E.g.

import numpy as np
y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
num_zeros = (y == 0).sum()
num_ones = (y == 1).sum()

answered May 05 '16 at 20:51

Gus Hecht

1,931
1
9
3

2

It's definitely the easiest to read. The question is which is fastest, and most space efficient – Nathan majicvr.com May 30 '18 at 19:02
1

Mightbe less space efficient than numpy.count_nonzero(y==0), since it evaluates the vector (y==0) – Sridhar Thiagarajan Oct 27 '18 at 22:21
1

I like this because is similar to matlab/octave `sum( vector==value )` – ePi272314 Aug 17 '19 at 17:01
This is also going to work for other values in the array - not just 0 and 1; they don't even have to be numbers. `(y == "A")` returns an array of `bool` values. Since booleans are equal to 0 and 1 in Python, so they can be summed: `(y == "A").sum()` will return the count of `A`s in the array `y`. – natka_m Dec 17 '20 at 15:00

score 68 · Answer 4 · edited Oct 27 '22 at 19:48

68

For your case you could also look into numpy.bincount

In [56]: a = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])

In [57]: np.bincount(a)
Out[57]: array([8, 4])  #count of zeros is at index 0, i.e. 8
                        #count of ones is at index 1, i.e. 4

edited Oct 27 '22 at 19:48

nalzok

14,965
21
72
139

answered Feb 22 '15 at 23:45

Akavall

82,592
51
207
251

3

This code may be one of the fastest solutions for larger arrays I experimented. Getting the result as a list is a bonus, too. Thanx! – Youngsup Kim Oct 24 '18 at 22:56
And if 'a' is an n-dimensional array, we can just use: np.bincount(np.reshape(a, a.size)) – Ari Jan 15 '20 at 09:01
be aware: (1) this rounds down non-integers. e.g. `np.bincount([0, 0.5, 1.1]) == array([2, 1])` (2) if you have an array with large integers, you will get a long output, e.g. `len(np.bincount([1000])) == 1001`. – icemtel Jul 02 '21 at 09:29
It is helpful, too, for when we need showing zero for values between the max and the min values e.g. if `a = np.array([0, 0, 0, 2, 0, 2, 2, 0, 0, 0, 0, 2])` it will gets `[8 0 4]`. So, for `1` it put `0` in the result. – Ali_Sh Feb 06 '22 at 02:42

Joel · Answer 5 · 2015-02-22T22:15:52.547

y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])

If you know that they are just 0 and 1:

np.sum(y)

gives you the number of ones. np.sum(1-y) gives the zeroes.

For slight generality, if you want to count 0 and not zero (but possibly 2 or 3):

np.count_nonzero(y)

gives the number of nonzero.

But if you need something more complicated, I don't think numpy will provide a nice count option. In that case, go to collections:

import collections
collections.Counter(y)
> Counter({0: 8, 1: 4})

This behaves like a dict

collections.Counter(y)[0]
> 8

Milind Dumbare · Answer 6 · 2015-02-22T22:19:15.780

25

Convert your array y to list l and then do l.count(1) and l.count(0)

>>> y = numpy.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
>>> l = list(y)
>>> l.count(1)
4
>>> l.count(0)
8

edited Feb 22 '15 at 22:19

answered Feb 22 '15 at 22:12

Milind Dumbare

3,104
2
19
32

score 17 · Answer 7 · answered Nov 22 '17 at 10:31

17

If you know exactly which number you're looking for, you can use the following;

lst = np.array([1,1,2,3,3,6,6,6,3,2,1])
(lst == 2).sum()

returns how many times 2 is occurred in your array.

answered Nov 22 '17 at 10:31

CanCeylan

2,890
8
41
51

score 16 · Answer 8 · edited Aug 19 '21 at 21:46

16

Filter and use `len`

Using len could be another option.

A = np.array([1,0,1,0,1,0,1])

Say we want the number of occurrences of 0.

A[A==0]  # Return the array where item is 0, array([0, 0, 0])

Now, wrap it around with len.

len(A[A==0])  # 3
len(A[A==1])  # 4
len(A[A==7])  # 0, because there isn't such item.

edited Aug 19 '21 at 21:46

Ibrahim Berber

842
2
16

answered Mar 11 '16 at 18:29

Anas

866
1
13
23

wordsforthewise · Answer 9 · 2017-09-11T17:41:51.960

12

Honestly I find it easiest to convert to a pandas Series or DataFrame:

import pandas as pd
import numpy as np

df = pd.DataFrame({'data':np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])})
print df['data'].value_counts()

Or this nice one-liner suggested by Robert Muil:

pd.Series([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1]).value_counts()

edited Sep 11 '17 at 17:41

answered Oct 28 '16 at 17:58

wordsforthewise

13,746
5
87
117

6

Just a note: don't need the DataFrame or numpy, can go directly from a list to a Series: `pd.Series([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1]).value_counts()` – Robert Muil Feb 02 '17 at 20:14
Awesome, that's a nice one-liner. Big up – wordsforthewise Sep 11 '17 at 17:43

Næreen · Answer 10 · 2018-10-24T08:19:26.923

11

No one suggested to use numpy.bincount(input, minlength) with minlength = np.size(input), but it seems to be a good solution, and definitely the fastest:

In [1]: choices = np.random.randint(0, 100, 10000)

In [2]: %timeit [ np.sum(choices == k) for k in range(min(choices), max(choices)+1) ]
100 loops, best of 3: 2.67 ms per loop

In [3]: %timeit np.unique(choices, return_counts=True)
1000 loops, best of 3: 388 µs per loop

In [4]: %timeit np.bincount(choices, minlength=np.size(choices))
100000 loops, best of 3: 16.3 µs per loop

That's a crazy speedup between numpy.unique(x, return_counts=True) and numpy.bincount(x, minlength=np.max(x)) !

edited Oct 24 '18 at 08:19

answered Mar 17 '17 at 16:19

Næreen

1,126
1
12
23

hows it compare to histogram? – john k Oct 22 '18 at 03:02
@johnktejik `np.histogram` does not compute the same thing. No point comparing the three approaches I propose with the `histogram` function, sorry. – Næreen Oct 24 '18 at 08:20
3

@Næreen `bincount` only works for integers though, so it works for the OP's problem, but maybe not for the generic problem described in the title. Also have you tried using `bincount` with arrays with very big ints? – Imperishable Night Oct 27 '18 at 13:19
@ImperishableNight no I haven't tried with large ints, but anyone is welcome to do so and post their own benchmark :-) – Næreen Oct 30 '18 at 17:47
Thank you for this underappreciated trick! On my machine `bincount` is about four times faster than `unique`. – Björn Lindqvist Oct 29 '19 at 20:47
I've tried a shape of(74402,) arr, `unique` takes `5.84 ms ± 33.6 µs per loop`; while `bincount` takes `231 µs ± 16.3 µs`; and as @ImperishableNight mentioned, for bincount `.astype(int)` is needed – CN_Cabbage Oct 10 '22 at 09:22

norok2 · Answer 11 · 2022-09-20T09:21:02.120

If you are interested in the fastest execution, you know in advance which value(s) to look for, and your array is 1D, or you are otherwise interested in the result on the flattened array (in which case the input of the function should be np.ravel(arr) rather than just arr), then Numba is your friend:

import numba as nb


@nb.jit
def count_nb(arr, value):
    result = 0
    for x in arr:
        if x == value:
            result += 1
    return result

or, for very large arrays where parallelization may be beneficial:

@nb.jit(parallel=True)
def count_nbp(arr, value):
    result = 0
    for i in nb.prange(arr.size):
        if arr[i] == value:
            result += 1
    return result

These can be benchmarked against np.count_nonzero() (which also has a problem of creating a temporary array -- something that is avoided in the Numba solutions) and a np.unique()-based solution (which is actually counting all unique value values contrarily to the other solutions).

import numpy as np


def count_np(arr, value):
    return np.count_nonzero(arr == value)

import numpy as np


def count_np_uniq(arr, value):
    uniques, counts = np.unique(a, return_counts=True)
    counter = dict(zip(uniques, counts))
    return counter[value] if value in counter else 0

Since the support for "typed" dicts in Numba, it is also possible to have a function counting all occurrences of all elements. This competes more directly with np.unique() because it is capable of counting all values in a single run. Here is proposed a version which eventually only returns the number of elements for a single value (for comparison purposes, similarly to what is done in count_np_uniq()):

@nb.jit
def count_nb_dict(arr, value):
    counter = {arr[0]: 1}
    for x in arr:
        if x not in counter:
            counter[x] = 1
        else:
            counter[x] += 1
    return counter[value] if value in counter else 0

The input is generated with:

def gen_input(n, a=0, b=100):
    return np.random.randint(a, b, n)

The timings are reported in the following plots (the second row of plots is a zoom on the faster approaches):

Showing that the simple Numba-based solution is fastest for smaller inputs and the parallelized version is fastest for larger inputs. They NumPy version is reasonably fast at all scales.

When one wants to count all values in an array, np.unique() is more performant than a solution implemented manually with Numba for sufficiently large arrays.

EDIT: It seems that the NumPy solution has become faster in recent versions. In a previous iteration, the simple Numba solution was outperforming NumPy's approach for any input size.

Full code available here.

kmario23 · Answer 12 · 2021-03-18T17:28:38.320

8

To count the number of occurrences, you can use np.unique(array, return_counts=True):

In [75]: boo = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
 
# use bool value `True` or equivalently `1`
In [77]: uniq, cnts = np.unique(boo, return_counts=1)
In [81]: uniq
Out[81]: array([0, 1])   #unique elements in input array are: 0, 1

In [82]: cnts
Out[82]: array([8, 4])   # 0 occurs 8 times, 1 occurs 4 times

edited Mar 18 '21 at 17:28

answered Dec 23 '16 at 19:57

kmario23

57,311
13
161
150

score 7 · Answer 13 · answered Oct 19 '15 at 14:15

7

I'd use np.where:

how_many_0 = len(np.where(a==0.)[0])
how_many_1 = len(np.where(a==1.)[0])

answered Oct 19 '15 at 14:15

MaxG

243
4
8

score 7 · Answer 14 · answered May 19 '16 at 19:14

7

y.tolist().count(val)

with val 0 or 1

Since a python list has a native function count, converting to list before using that function is a simple solution.

answered May 19 '16 at 19:14

michael

371
3
12

score 7 · Answer 15 · answered Nov 21 '19 at 02:12

7

Try this:

a = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
list(a).count(1)

answered Nov 21 '19 at 02:12

jarh1992

603
7
8

score 6 · Answer 16 · answered Oct 04 '16 at 09:30

Yet another simple solution might be to use numpy.count_nonzero():

import numpy as np
y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
y_nonzero_num = np.count_nonzero(y==1)
y_zero_num = np.count_nonzero(y==0)
y_nonzero_num
4
y_zero_num
8

Don't let the name mislead you, if you use it with the boolean just like in the example, it will do the trick.

Sébastien Wieckowski · Answer 17 · 2020-06-10T13:59:14.163

6

take advantage of the methods offered by a Series:

>>> import pandas as pd
>>> y = [0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1]
>>> pd.Series(y).value_counts()
0    8
1    4
dtype: int64

edited Jun 10 '20 at 13:59

answered Jun 13 '19 at 10:33

Sébastien Wieckowski

812
13
16

CB Madsen · Answer 18 · 2022-08-02T09:07:34.153

You can use dictionary comprehension to create a neat one-liner. More about dictionary comprehension can be found here

>>> counts = {int(value): list(y).count(value) for value in set(y)}
>>> print(counts)
{0: 8, 1: 4}

This will create a dictionary with the values in your ndarray as keys, and the counts of the values as the values for the keys respectively.

This will work whenever you want to count occurences of a value in arrays of this format.

score 3 · Answer 19 · answered May 06 '19 at 16:28

3

You have a special array with only 1 and 0 here. So a trick is to use

np.mean(x)

which gives you the percentage of 1s in your array. Alternatively, use

np.sum(x)
np.sum(1-x)

will give you the absolute number of 1 and 0 in your array.

answered May 06 '19 at 16:28

CathyQian

1,081
15
30

score 3 · Answer 20 · answered Sep 17 '19 at 10:48

3

dict(zip(*numpy.unique(y, return_counts=True)))

Just copied Seppo Enarvi's comment here which deserves to be a proper answer

answered Sep 17 '19 at 10:48

Code42

2,292
1
17
22

score 2 · Answer 21 · answered Dec 24 '15 at 22:35

It involves one more step, but a more flexible solution which would also work for 2d arrays and more complicated filters is to create a boolean mask and then use .sum() on the mask.

>>>>y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
>>>>mask = y == 0
>>>>mask.sum()
8

sol · Answer 22 · 2016-11-18T16:19:22.287

A general and simple answer would be:

numpy.sum(MyArray==x)   # sum of a binary list of the occurence of x (=0 or 1) in MyArray

which would result into this full code as exemple

import numpy
MyArray=numpy.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])  # array we want to search in
x=0   # the value I want to count (can be iterator, in a list, etc.)
numpy.sum(MyArray==0)   # sum of a binary list of the occurence of x in MyArray

Now if MyArray is in multiple dimensions and you want to count the occurence of a distribution of values in line (= pattern hereafter)

MyArray=numpy.array([[6, 1],[4, 5],[0, 7],[5, 1],[2, 5],[1, 2],[3, 2],[0, 2],[2, 5],[5, 1],[3, 0]])
x=numpy.array([5,1])   # the value I want to count (can be iterator, in a list, etc.)
temp = numpy.ascontiguousarray(MyArray).view(numpy.dtype((numpy.void, MyArray.dtype.itemsize * MyArray.shape[1])))  # convert the 2d-array into an array of analyzable patterns
xt=numpy.ascontiguousarray(x).view(numpy.dtype((numpy.void, x.dtype.itemsize * x.shape[0])))  # convert what you search into one analyzable pattern
numpy.sum(temp==xt)  # count of the searched pattern in the list of patterns

deckard · Answer 23 · 2018-11-06T11:21:50.490

2

For generic entries:

x = np.array([11, 2, 3, 5, 3, 2, 16, 10, 10, 3, 11, 4, 5, 16, 3, 11, 4])
n = {i:len([j for j in np.where(x==i)[0]]) for i in set(x)}
ix = {i:[j for j in np.where(x==i)[0]] for i in set(x)}

Will output a count:

{2: 2, 3: 4, 4: 2, 5: 2, 10: 2, 11: 3, 16: 2}

And indices:

{2: [1, 5],
3: [2, 4, 9, 14],
4: [11, 16],
5: [3, 12],
10: [7, 8],
11: [0, 10, 15],
16: [6, 13]}

edited Nov 06 '18 at 11:21

answered Nov 06 '18 at 11:12

deckard

852
10
15

score 1 · Answer 24 · edited Oct 22 '16 at 00:25

1

This can be done easily in the following method

y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
y.tolist().count(1)

edited Oct 22 '16 at 00:25

Eli Sadoff

7,173
6
33
61

answered Oct 21 '16 at 21:37

user7055304

31
3

score 1 · Answer 25 · answered Jan 12 '17 at 17:40

1

Since your ndarray contains only 0 and 1, you can use sum() to get the occurrence of 1s and len()-sum() to get the occurrence of 0s.

num_of_ones = sum(array)
num_of_zeros = len(array)-sum(array)

answered Jan 12 '17 at 17:40

Sabeer Ebrahim

105
1
1
7

score 1 · Answer 26 · edited Jul 21 '22 at 12:56

1

here I have something, through which you can count the number of occurrence of a particular number: according to your code

count_of_zero=list(y[y==0]).count(0) 

print(count_of_zero)

// according to the match there will be boolean values and according
// to True value the number 0 will be return.

edited Jul 21 '22 at 12:56

Francisco Maria Calisto

2,841
4
22
54

answered Jan 04 '20 at 21:32

The Guy

411
4
11

score 1 · Answer 27 · answered May 05 '20 at 10:53

if you are dealing with very large arrays using generators could be an option. The nice thing here it that this approach works fine for both arrays and lists and you dont need any additional package. Additionally, you are not using that much memory.

my_array = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
sum(1 for val in my_array if val==0)
Out: 8

score 1 · Answer 28 · answered Jun 26 '20 at 21:53

This funktion returns the number of occurences of a variable in an array:

def count(array,variable):
    number = 0
    for i in range(array.shape[0]):
        for j in range(array.shape[1]):
            if array[i,j] == variable:
                number += 1
    return number

score 0 · Answer 29 · answered Jul 08 '16 at 14:41

If you don't want to use numpy or a collections module you can use a dictionary:

d = dict()
a = [0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1]
for item in a:
    try:
        d[item]+=1
    except KeyError:
        d[item]=1

result:

>>>d
{0: 8, 1: 4}

Of course you can also use an if/else statement. I think the Counter function does almost the same thing but this is more transparant.

Haribk · Answer 30 · 2023-01-05T10:01:01.300

0

The simplest,do comment if not necessary

import numpy as np
y = np.array([0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1])
count_0, count_1 = 0, 0
for i in y_train:
    if i == 0:
        count_0 += 1
    if i == 1:
        count_1 += 1
count_0, count_1

edited Jan 05 '23 at 10:01

answered Jan 05 '23 at 07:59

Haribk

131
7

score -1 · Answer 31 · answered Apr 26 '17 at 10:37

-1

Numpy has a module for this. Just a small hack. Put your input array as bins.

numpy.histogram(y, bins=y)

The output are 2 arrays. One with the values itself, other with the corresponding frequencies.

answered Apr 26 '17 at 10:37

Ishan Tomar

1,488
1
16
20

isn't 'bins' supposed to be a number? – john k Oct 22 '18 at 03:07
1

Yes @johnktejik you're right. This answer does *not* work. – Næreen Oct 24 '18 at 08:21

score -5 · Answer 32 · edited Aug 19 '19 at 09:48

-5

using numpy.count

$ a = [0, 0, 0, 1, 0, 1, 1, 0, 0, 0, 0, 1]

$ np.count(a, 1)

edited Aug 19 '19 at 09:48

Suraj Rao

29,388
11
94
103

answered Aug 19 '19 at 09:45

Anggi Permana Harianja

85
2
2

1

What's with the dollar signs here? – tripleee Apr 01 '20 at 13:50
1

No such function – PigSpider Jul 25 '20 at 22:59

How do I count the occurrence of a certain item in an ndarray?

32 Answers32

Filter and use `len`

Linked

Related

How do I count the occurrence of a certain item in an ndarray?

32 Answers32

Filter and use len

Linked

Related

Filter and use `len`