62

I have a list of integers and I need to count how many of them are > 0.
I'm currently doing it with a list comprehension that looks like this:

sum([1 for x in frequencies if x > 0])

It seems like a decent comprehension but I don't really like the "1"; it seems like a bit of a magic number. Is there a more Pythonish way to do this?

smci
  • 32,567
  • 20
  • 113
  • 146
fairfieldt
  • 861
  • 1
  • 6
  • 10

8 Answers8

98

If you want to reduce the amount of memory, you can avoid generating a temporary list by using a generator:

sum(x > 0 for x in frequencies)

This works because bool is a subclass of int:

>>> isinstance(True,int)
True

and True's value is 1:

>>> True==1
True

However, as Joe Golton points out in the comments, this solution is not very fast. If you have enough memory to use a intermediate temporary list, then sth's solution may be faster. Here are some timings comparing various solutions:

>>> frequencies = [random.randint(0,2) for i in range(10**5)]

>>> %timeit len([x for x in frequencies if x > 0])   # sth
100 loops, best of 3: 3.93 ms per loop

>>> %timeit sum([1 for x in frequencies if x > 0])
100 loops, best of 3: 4.45 ms per loop

>>> %timeit sum(1 for x in frequencies if x > 0)
100 loops, best of 3: 6.17 ms per loop

>>> %timeit sum(x > 0 for x in frequencies)
100 loops, best of 3: 8.57 ms per loop

Beware that timeit results may vary depending on version of Python, OS, or hardware.

Of course, if you are doing math on a large list of numbers, you should probably be using NumPy:

>>> frequencies = np.random.randint(3, size=10**5)
>>> %timeit (frequencies > 0).sum()
1000 loops, best of 3: 669 us per loop

The NumPy array requires less memory than the equivalent Python list, and the calculation can be performed much faster than any pure Python solution.

Community
  • 1
  • 1
unutbu
  • 842,883
  • 184
  • 1,785
  • 1,677
  • 2
    A variation: [x > 0 for x in frequencies].count(True) – Peter Jaric May 24 '10 at 20:38
  • 3
    @Peter: note that your suggestion loops twice over the data; once to build the output list, and twice to count True values. – tzot Jun 24 '10 at 19:16
  • Relying on the boolean evaluation to be interpreted as 1 is a) arguably poor practice, and B) much slower. – Adam Parkin Jul 18 '12 at 23:07
  • +1 for slightly more readable. However, I found it takes about 52% longer (the function I tested counted the number of factors in large numbers). So only use for comprehensions with few iterations ( < 10,000? ). – Joe Golton Jul 09 '13 at 15:08
  • @JoeGolton: Thanks for the comment. Indeed there are faster solutions, such as sth's, or by using NumPy. – unutbu Jul 09 '13 at 15:45
  • I'm surprised that list comprehension is faster than generator expression - it didn't even occur to me to try a list comprehension. Why is it so much faster? – Joe Golton Jul 09 '13 at 18:35
  • @JoeGolton: There are so many factors here that have an impact on speed that it is hard to make any general statement about why one is faster than another. `len` being faster than `sum` is one such factor. My experience has been that with Python2 list comprehensions are often faster than generator expressions *when you have enough memory*. – unutbu Jul 09 '13 at 19:54
  • @Joe Golton: But every version of Python may be different -- In Python3 [Guido van Rossum writes](http://python-history.blogspot.com/2010/06/from-list-comprehensions-to-generator.html) that "there is no longer a speed difference between the two". Though for me using Python3.1, the timeit results above remain roughly unchanged. The only surefire way I know to decide what is faster is to benchmark on a case-by-case basis. – unutbu Jul 09 '13 at 19:55
  • Thanks - it turns out that in my application the difference was minor, as the counts were low (as opposed to your example where the counts were high). So you're right - benchmarking case by case is the way to go. – Joe Golton Jul 09 '13 at 20:57
35

A slightly more Pythonic way would be to use a generator instead:

sum(1 for x in frequencies if x > 0)

This avoids generating the whole list before calling sum().

Greg Hewgill
  • 951,095
  • 183
  • 1,149
  • 1,285
10

You could use len() on the filtered list:

len([x for x in frequencies if x > 0])
sth
  • 222,467
  • 53
  • 283
  • 367
  • 3
    even better, to use a generator (strip [ and ]) – Valentin Golev May 24 '10 at 20:34
  • 1
    You could use filter with this to make it look more clear. len(filter(lambda x: x > 0, frequencies)) – Jonathan Sternberg May 24 '10 at 20:35
  • @Jonathan: I'd say it's a matter of taste if you prefer `filter()` or a list comprehension, but usually list comprehensions are preferred to functional programming style. (And the OP asked for a list comprehension.) – sth May 24 '10 at 20:53
  • the OP actually only said (s)he is using a decent list comprehension right now, but didn't specifically ask for one. But your main point still holds, of course. – Peter Jaric May 24 '10 at 21:02
  • 1
    @JonathanSternberg: in Python 3, that syntax won't work (you can't do a `len()` on a filter object). – Adam Parkin Jul 18 '12 at 23:01
  • @AdamParkin Not nearly as good, but you can just add "list(filter(...))" and len works again. Not nearly as good looking as a list comprehension though that would work in both languages (and wouldn't copy the list). But you're right, it won't work in Python 3. – Jonathan Sternberg Jul 19 '12 at 03:41
4

This works, but adding bools as ints may be dangerous. Please take this code with a grain of salt (maintainability goes first):

sum(k>0 for k in x)
Escualo
  • 40,844
  • 23
  • 87
  • 135
  • 2
    Adding booleans as integers is guaranteed to work in Python 2 and 3: http://stackoverflow.com/questions/2764017/is-false-0-and-true-1-in-python-an-implementation-detail-or-guaranteed-by-t – Eric O. Lebigot May 26 '10 at 07:30
4

If the array only contains elements >= 0 (i.e. all elements are either 0 or a positive integer) then you could just count the zeros and subtract this number form the length of the array:

len(arr) - arr.count(0)
ben_nuttall
  • 859
  • 10
  • 20
2

How about this?

reduce(lambda x, y: x+1 if y > 0 else x, frequencies)

EDIT: With inspiration from the accepted answer from @~unutbu:

reduce(lambda x, y: x + (y > 0), frequencies)

Peter Jaric
  • 5,162
  • 3
  • 30
  • 42
  • I wish I had got a comment to go with that down vote to learn by my mistakes. Please? – Peter Jaric May 24 '10 at 20:39
  • There seems to be a trend away from lambda functions toward list comprehensions. – fairfieldt May 28 '10 at 23:31
  • 1
    I wasn't one to downvote you; however I would gather that people tend to frown upon `reduce`, it being phased out etc (by Guido proclamation). I like `reduce`, but I too frown upon its use in this case, since the `sum(x > 0…)` variant seems more straightforward to me. – tzot Jun 24 '10 at 19:20
0

I would like to point out that all said applies to lists. If we have a numpy array, there are solutions that will be at least fourty times faster...

Summing up all solutions given and testing for efficiency, plus adding some more (had to modify the reduce code to be able to run it in Python 3), note that the last answer is in micros, not millis: enter image description here

code in copy-pastable format:

import random
import functools
frequencies = [random.randint(0,2) for i in range(10**5)]
from collections import Counter
import numpy as np

%timeit len([x for x in frequencies if x > 0])   # sth
%timeit sum([1 for x in frequencies if x > 0])
%timeit sum(1 for x in frequencies if x > 0)
%timeit sum(x > 0 for x in frequencies)
%timeit functools.reduce(lambda x, y: x + (y > 0), frequencies)
%timeit Counter(frequencies)

#'-------Numpy-----------------------')
%timeit ((np.array(frequencies))>0).sum()
npf=np.array(frequencies)
#'-------Numpy without conversion ---')
%timeit (npf>0).sum()
ntg
  • 12,950
  • 7
  • 74
  • 95
0

You can also use numpy.count_nonzero like this:

import numpy as np
xs = [1,0,4,0,7]
print(np.count_nonzero(xs)) #3
KyleMit
  • 30,350
  • 66
  • 462
  • 664