64

I am trying to find a simple and fast way of counting the number of Objects in a list that match a criteria. e.g.

class Person:
    def __init__(self, Name, Age, Gender):
        self.Name = Name
        self.Age = Age
        self.Gender = Gender

# List of People
PeopleList = [Person("Joan", 15, "F"), 
              Person("Henry", 18, "M"), 
              Person("Marg", 21, "F")]

Now what's the simplest function for counting the number of objects in this list that match an argument based on their attributes? E.g., returning 2 for Person.Gender == "F" or Person.Age < 20.

FacesOfMu
  • 927
  • 2
  • 7
  • 15

5 Answers5

93
class Person:
    def __init__(self, Name, Age, Gender):
        self.Name = Name
        self.Age = Age
        self.Gender = Gender


>>> PeopleList = [Person("Joan", 15, "F"), 
              Person("Henry", 18, "M"), 
              Person("Marg", 21, "F")]
>>> sum(p.Gender == "F" for p in PeopleList)
2
>>> sum(p.Age < 20 for p in PeopleList)
2
jamylak
  • 128,818
  • 30
  • 231
  • 230
  • 43
    I prefer `sum(1 for p in PeopleList if p.Gender == "F")` because it doesn't abuse the fact that bool subclass int. – wim May 09 '13 at 06:40
  • 6
    @wim http://stackoverflow.com/questions/3174392/is-it-pythonic-to-use-bools-as-ints – Ashwini Chaudhary May 09 '13 at 06:44
  • Yes, I am aware of that post, and already have my -1 vote on Alex's answer. ;) See also http://stackoverflow.com/a/8169049/674039 – wim May 09 '13 at 06:45
  • 6
    I have to agree with @wim here. Even being a long time python programmer, it took me awhile to evaluate what was being done with the sum statement. I find `sum(1 for p in PeopleList if p.Gender == "F")` more explicit. – monkut May 09 '13 at 07:12
  • 1
    @monkut My final word on the issue. Read this answer by Guido http://stackoverflow.com/a/6865824/1219006 – jamylak May 09 '13 at 07:14
  • 2
    Actually this quote sums it up perfectly: *"There was quite some pushback at the time since many of us feared that the new type and constants would be used by Python newbies to restrict the language's abilities, but Guido was adamant that we were just being pessimistic: nobody would ever understand Python so badly, for example, as to avoid the perfectly natural use of False and True as list indices, or **in a summation**, or other such **perfectly clear and useful idioms.**"* – jamylak May 09 '13 at 07:21
  • 7
    But it is not perfectly clear. Perhaps it's clear to computer science folks and those who have been using python before bools even existed in the language, but it's not obvious to someone with a mathematical mind for whom "True" and "the number 1" are conceptually very different objects. The sum over the condition is a mental stumbling block, and it requires an extra moment of thinking compared to reading the direct comprehension. – wim May 09 '13 at 08:18
  • @jamylak keep in mind that question is in the context of using it for a `bool_to_str()` function, and even in that case Guido suggests a different approach (which I agree with); `def bool_to_str(value): return 'Yes' if value else 'No'` – monkut May 09 '13 at 08:28
  • 4
    @monkut Guido States using `False` and `True` as list indices is perfectly fine. He also said other such clear idioms. This is also a clear use of `False` and `True` – jamylak May 23 '15 at 09:13
  • It's not just about that the programming language can do...it's also about semantics. I'd consider the original answer "idiomatic", though. The meaning is clear, even if you don't immediately know why. -- Of course the PEP 285 approach does have some potential pitfalls for the more mathematically minded persons. For example, the negation operates in binary, not boolean (you need to write "not True" instead of "-True", because "-1" is neither True nor False, but will evaluate to True). – Klaws Jun 22 '20 at 09:32
  • I found that `sum( a[n]==b|n] for n in L)` is about 5 times slower than `for n in L: if a[n]==b[n]: cnt +=1`. Can anyone reproduce that and ... why ?!? – Max Jun 30 '21 at 04:47
  • @Max that would be because `sum` uses a generator expression and the other one is more basic code but better optimised. – jamylak Jul 01 '21 at 04:19
17

I know this is an old question but these days one stdlib way to do this would be

from collections import Counter

c = Counter(getattr(person, 'gender') for person in PeopleList)
# c now is a map of attribute values to counts -- eg: c['F']
lonetwin
  • 971
  • 10
  • 17
  • 1
    Why are you using `getattr(person, 'gender')` instead of simply just `person.gender`? It's unnecessary and superfluous – jamylak Feb 25 '19 at 01:20
  • @jamylak good point. I don't really remember. I perhaps was thinking in terms of dynamically selected attributes. For instance: `c = { attr: Counter(getattr(person, attr) for person in PeopleList) for attr in ['Name', 'Age', 'Gender'] }` , `c` now is a map of attributes to counter of values. Edit: I do remember now. It is that way because it [evolved](https://stackoverflow.com/posts/40789844/revisions) that way :) – lonetwin Feb 26 '19 at 20:58
  • thanks for the clarification i can see how that . makes sense if we are simply counting each attribute but the question has examples . eg. `Person.Age < 20` which isnt suited to this and better as a simple `if statement` – jamylak Feb 27 '19 at 01:10
10

I found that using a list comprehension and getting its length was faster than using sum().

According to my tests...

len([p for p in PeopleList if p.Gender == 'F'])

...runs 1.59 times as fast as...

sum(p.Gender == "F" for p in PeopleList)
Webucator
  • 2,397
  • 24
  • 39
  • 3
    Not a fair test unless you also test `sum([p.Gender == "F" for p in PeopleList])` and post results for tiny, medium and gigantic data – jamylak Feb 02 '18 at 05:48
5

I prefer this:

def count(iterable):
    return sum(1 for _ in iterable)

Then you can use it like this:

femaleCount = count(p for p in PeopleList if p.Gender == "F")

which is cheap (doesn't create useless lists etc) and perfectly readable (I'd say better than both sum(1 for … if …) and sum(p.Gender == "F" for …)).

Alfe
  • 56,346
  • 20
  • 107
  • 159
3

Personally I think that defining a function is more simple over multiple uses:

def count(seq, pred):
    return sum(1 for v in seq if pred(v))

print(count(PeopleList, lambda p: p.Gender == "F"))
print(count(PeopleList, lambda p: p.Age < 20))

Particularly if you want to reuse a query.

kampu
  • 1,391
  • 1
  • 10
  • 14