Counting elements matching a pattern in a tuple of tuples

Question

I have a matrix m where I would like to calculate the number of zeros.

m=((2,0,2,2),(4,4,5,4),(0,9,4,8),(2,2,0,0))

My current code is as follows:

def zeroCount(M):
    return [item for row in M for item in row].count(0)
    # list of lists is flattened to form single list, and number of 0 are counted

Is there any way to do this quicker? Currently, I'm taking 0.4s to execute the function 20,000 times on 4 by 4 matrices, where the matrices are equally likely to contain zeros as they are not.

Some possible places to start (but which I could not make to work faster than my code) are these other questions: counting non-zero elements in numpy array, finding the indices of non-zero elements, and counting non-zero elements in iterable.

related: [Python 3: How to check lists within lists with .count()](http://stackoverflow.com/q/23299905/4279) — jfs, Apr 27 '14 at 02:14

score 4 · Accepted Answer · edited May 23 '17 at 11:50

The fastest so far:

def count_zeros(matrix):
    total = 0
    for row in matrix:
        total += row.count(0)
    return total

For 2D tuple you could use a generator expression:

def count_zeros_gen(matrix):
    return sum(row.count(0) for row in matrix)

Time comparison:

%timeit [item for row in m for item in row].count(0) # OP
1000000 loops, best of 3: 1.15 µs per loop

%timeit len([item for row in m for item in row if item == 0]) # @thefourtheye
1000000 loops, best of 3: 913 ns per loop

%timeit sum(row.count(0) for row in m) 
1000000 loops, best of 3: 1 µs per loop

%timeit count_zeros(m)
1000000 loops, best of 3: 775 ns per loop

For the baseline:

def f(m): pass
%timeit f(m)
10000000 loops, best of 3: 110 ns per loop

@J.F. your code looks like it's going to be the fastest, but I'll leave this question open for a while more just to let other people have a try. — Vincent Tjeng, Apr 28 '14 at 15:07

Kei Minagawa · Answer 2 · 2014-04-27T10:13:19.827

3

Here is my answer.

reduce(lambda a, b: a + b, m).count(0)

Time:

%timeit count_zeros(m)                                        #@J.F. Sebastian
1000000 loops, best of 3: 813 ns per loop

%timeit len([item for row in m for item in row if item == 0]) #@thefourtheye
1000000 loops, best of 3: 974 ns per loop

%timeit reduce(lambda a, b: a + b, m).count(0)                #Mine
1000000 loops, best of 3: 1.02 us per loop

%timeit countzeros(m)                                         #@frostnational
1000000 loops, best of 3: 1.07 us per loop

%timeit sum(row.count(0) for row in m)                        #@J.F. Sebastian
1000000 loops, best of 3: 1.28 us per loop

%timeit [item for row in m for item in row].count(0)          #OP
1000000 loops, best of 3: 1.53 us per loop

~~@thefourtheye's is the fastest. This is because of few function call.~~

@J.F. Sebastian's is the fastest in my environment. I don't know why...

edited Apr 27 '14 at 10:13

answered Apr 27 '14 at 09:28

Kei Minagawa

4,395
3
25
43

Still slower than mine :P See the tests – vaultah Apr 27 '14 at 09:34
Tests in independent environment: [Python3](https://ideone.com/3onYmW), [Python2](https://ideone.com/Q7LKvb) – vaultah Apr 27 '14 at 09:50
I've added [faster solution with explicit loops instead of the generator expression](http://stackoverflow.com/a/23318776/4279) – jfs Apr 27 '14 at 10:16
@J.F. Sebastian:Added. Ordinary code is very fast!! – Kei Minagawa Apr 27 '14 at 10:22

score 2 · Answer 3 · answered Apr 26 '14 at 12:14

2

The problem with your solution is that, you have to iterate the list again to get the count O(N). But the len function can get the count in O(1).

You can make this a lot quicker with this

def zeroCount(M):
    return len([item for row in M for item in row if item == 0])

answered Apr 26 '14 at 12:14

thefourtheye

233,700
52
457
497

1

On my machine the result are similar: 913 for `len([])` vs. 1150 for `[].count` for the `m` tuple from the question. For comparison: `def f(m): pass` is 110 on my machine – jfs Apr 27 '14 at 02:34
I would have suggested **sum(item == 0 for row in m for item in row)**, but it turned out as much slower :( – volcano Apr 27 '14 at 11:58
@volcano Before suggesting this, I even timed that along with other options :) – thefourtheye Apr 27 '14 at 12:01

vaultah · Answer 4 · 2014-04-27T11:45:10.060

Check this out:

from itertools import chain, filterfalse # ifilterfalse for Python 2
def zeroCount(m):
    total = 0
    for x in filterfalse(bool, chain(*m)): 
        total += 1
    return total

Perfomance tests on Python 3.3.3:

from timeit import timeit
from itertools import chain, filterfalse
import functools

m = ((2,0,2,2),(4,4,5,4),(0,9,4,8),(2,2,0,0))

def zeroCountOP():
    return [item for row in m for item in row].count(0)

def zeroCountTFE():
    return len([item for row in m for item in row if item == 0])

def zeroCountJFS():
    return sum(row.count(0) for row in m)

def zeroCountuser2931409():
    # `reduce` is in `functools` in Py3k
    return functools.reduce(lambda a, b: a + b, m).count(0)

def zeroCount():
    total = 0
    for x in filterfalse(bool, chain(*m)): 
        total += 1
    return total

print('Original code     ', timeit(zeroCountOP, number=100000))
print('@J.F.Sebastian    ', timeit(zeroCountJFS, number=100000))
print('@thefourtheye     ', timeit(zeroCountTFE, number=100000))
print('@user2931409      ', timeit(zeroCountuser2931409, number=100000))
print('@frostnational    ', timeit(zeroCount, number=100000))

The above give me these results:

Original code      0.244224319984056
@thefourtheye      0.22169152169497108
@user2931409       0.19247795242092186
@frostnational     0.18846473728790825
@J.F.Sebastian     0.1439318853410907

@J.F.Sebastian's solution is a winner, mine is a runner-up (about 20% slower).

Comprehensive solution for both Python 2 and Python 3:

import sys
import itertools

if sys.version_info < (3, 0, 0):
    filterfalse = getattr(itertools, 'ifilterfalse')
else:
    filterfalse = getattr(itertools, 'filterfalse')


def countzeros(matrix):
    ''' Make a good use of `itertools.filterfalse`
        (`itertools.ifilterfalse` in case of Python 2) to count 
        all 0s in `matrix`. '''
    counter = 0
    for _ in filterfalse(bool, itertools.chain(*matrix)):
        counter += 1
    return counter


if __name__ == '__main__':
    # Benchmark
    from timeit import repeat
    print(repeat('countzeros(((2,0,2,2),(4,4,5,4),(0,9,4,8),(2,2,0,0)))',
                 'from __main__ import countzeros',
                 repeat=10,
                 number=100000))

For your approach, can't you just take `counter = len(itertools.filterfalse(...))` — smci, Apr 27 '14 at 09:53
I've added [solution with explicit loop](http://stackoverflow.com/a/23318776/4279). It is the fastest so far. — jfs, Apr 27 '14 at 10:10
The standard idiom for length of an iterator is `len(_ for _ in iter(...))`. Use the generator expression directly, no need to increment counter, may save you a few cycles. — smci, Apr 27 '14 at 10:20
@frostnational thanks for writing code to allow everyone to benchmark their code. I definitely should have included that in my initial question. — Vincent Tjeng, Apr 28 '14 at 15:10

jrennie · Answer 5 · 2014-04-26T12:38:18.370

1

Use numpy:

import numpy

m=((2,0,2,2),(4,4,5,4),(0,9,4,8),(2,2,0,0))
numpy_m = numpy.array(m)
print numpy.sum(numpy_m == 0)

How does the above work? First, your "matrix" is converted to a numpy array (numpy.array(m)). Then, each entry is checked for equality with zero (numpy_m == 0). This yields a binary array. Summing over this binary array gives the number of zero elements in the original array.

Note that numpy will be clearly efficient for larger matrices. 4x4 might be too small to see a large performance difference vs. ordinary python code, esp. if you are initializing a python "matrix" like above.

edited Apr 26 '14 at 12:38

answered Apr 26 '14 at 12:31

jrennie

1,937
12
16

2

it is very slow for 4x4 matrix. – jfs Apr 27 '14 at 02:30
@jrennie - perhaps you will see performance gains only for very large matrices, but not so much in my situation, where I am dealing with small matrices. I think the overhead from using `numpy` is quite high. – Vincent Tjeng Apr 28 '14 at 15:09

smci · Answer 6 · 2014-04-27T09:48:54.990

0

One numpy solution is:

import numpy as np

m = ((2,0,2,2),(4,4,5,4),(0,9,4,8),(2,2,0,0))
mm = np.array(m)

def zeroCountSmci():
    return (mm==0).sum() # sums across all axes, by default

edited Apr 27 '14 at 09:48

answered Apr 27 '14 at 09:39

smci

32,567
20
113
146

Even if the input is numpy array. It is several times slower for the given `m`. – jfs Apr 27 '14 at 10:19

Counting elements matching a pattern in a tuple of tuples

6 Answers6

Linked