33

The Zen of Python says that explicit is better than implicit.

Yet the Pythonic way of checking a collection c for emptiness is:

if not c:
    # ...

and checking if a collection is not empty is done like:

if c:
    # ...

ditto for anything that can have "zeroness" or "emptiness" (tuples, integers, strings, None, etc)

What is the purpose of this? Will my code be buggier if I don't do this? Or does it enable more use cases (i.e: some kind of polymorphism) since people can override these boolean coercions?

Yuval Adam
  • 161,610
  • 92
  • 305
  • 395
GNUnit
  • 577
  • 5
  • 9

8 Answers8

22

This best practice is not without reason.

When testing if object: you are basically calling the objects __bool__ method, which can be overridden and implemented according to object behavior.

For example, the __bool__ method on collections (__nonzero__ in Python 2) will return a boolean value based on whether the collection is empty or not.

(Reference: http://docs.python.org/reference/datamodel.html)

Nathaniel Jones
  • 939
  • 1
  • 14
  • 25
Yuval Adam
  • 161,610
  • 92
  • 305
  • 395
  • 6
    +1; This is exactly the "some kind of polymorphism" the OP asked about. – Merlyn Morgan-Graham Jun 23 '11 at 19:29
  • 2
    You just explained how my example code works, not why `if x` is better than `if x == 0`. Why can't collections just have a .length (.len if it makes you happy) method? It would still be polymorphism, but more explicit. The type of polymorphism here seems unsound to me; integers implement the same interface as collections ("falsyness"). The only use case I could see for this is implementing a function `f(x)` where x can be an integer or collection, so the function can see whether x is true/false, but in that case, why not just pass in a boolean? – GNUnit Jun 23 '11 at 19:49
  • @GNUnit: `x == 0` would compares it to an integer, and only that. You can't do that for the same reasons [`is Empty`](http://stackoverflow.com/questions/3488470/reason-why-there-is-no-if-empty-in-python) is a horrible idea. More acceptable would be `if len(collection) == 0:`, but that assumes the collection has a well-defined length, still has pretty much no advantage (using either a number or a collection is asking for trouble anyway) and is explicitly forbidden in PEP 8, and style guides shouldn't be violated without *good* reasons. –  Jun 23 '11 at 19:57
  • 1
    @delman, `x` is an integer in the comment you're replying to. Moreover, the use case I mentioned was showing that I could not think of a valid use case for this type of polymorphism. I created a function that only can know that its type has an emptiness property, but it can't possibly do anything else, because there's not that many other sane things that can be done on both an integer and a collection, aside from obvious things such as hashing. – GNUnit Jun 23 '11 at 20:03
  • 1
    the choice to define `__nonzero__` to return `len(self) != 0` on collection types was a deliberate choice, and it certainly wasn't accidental. It's explicit (the behavior would not be present if the method were not defined), it's intuitive (empty collections are also known as null sets, null being another name for false), and it's useful (less to type in the most common type of predicate on collections). Also note that `obj.__nonzero__()` doesn't really mean `obj != 0`, but rather, `bool(obj)`, which is exactly what the builtin `bool` does on new style classes. – SingleNegationElimination Jun 23 '11 at 20:20
  • 1
    Well technically there's an isomorphism between lists (length) and integers, but I see no reason to make an entire language resolve around the fact. – GNUnit Jun 23 '11 at 20:32
7

"Simple is better than complex."

"Readability counts."

Take for example if users -- it is more readable that if len(users) == 0

Gabi Purcaru
  • 30,940
  • 9
  • 79
  • 95
  • 1
    Uhh, the whole reason an "if statement" exists is to execute a piece of code if the argument to the statement was True. So this answer has nothing to do with what I'm talking about here. – GNUnit Jun 23 '11 at 19:28
  • lol oh you edited that out of your answer – GNUnit Jun 23 '11 at 19:29
  • @GNUnit yes, sorry, that was a little dumb ><. I've put a legitimate example – Gabi Purcaru Jun 23 '11 at 19:29
  • 3
    `if users` looks pretty ambiguous to me. If you know Python you should know what a list is and that all lists have length function. If you don't know Python, I'd say `if users` is less readable because it doesn't necessarily convey anything about what question is being asked about `users`. – GNUnit Jun 23 '11 at 19:38
  • 7
    @GNUnit: *any* language will be less readable until you learn its idioms. – Mark Ransom Jun 23 '11 at 19:45
  • @Mark Ransom: Gabi said that `is users` is more readable than `if len(users) == 0`. As someone who's been coding Python for years, I see no difference, and the latter seems a lot more sound. It looks to me that he's implying the former case is more readable to people who don't know Python. – GNUnit Jun 23 '11 at 20:06
7

Possibly trumped by the first and third items:

  • Beautiful is better than ugly.
  • Simple is better than complex.
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622
5
if c:
    # ...

is explicit. The "if" explicitly states that the expression following will be evaluated in boolean context.

It is fair to argue that

if c == 0:
    # ...

is clearer, and a programming novice might agree.

But to an experienced programmer (this one at least) the latter is not clearer, it is redundant. I already know c will be compared with 0 because of the "if"; I don't need to be told twice.

  • Explicit isn't an absolute thing, but I'm pretty sure most would agree that coercing something to boolean in certain contexts is implicit. Of course you're right that it can be redundant to do the explicit thing; that's because the entire language was designed around this idiom. In real world cases, the explicit case is not redundant at all and actually saves time because people don't have to go see how x object coerces to bool. Not to mention there are different ways it can be done (`__not__, __len__, __nonzero__, etc`) the complexity stacks up. – GNUnit Jun 23 '11 at 20:21
  • @GNUnit: agree, the coercion mechanism is implicit. But "c == 0" doesn't make it any more clear. "len(c) == 0" merits argument: less brief, but more explicit. –  Jun 26 '11 at 13:17
  • `c` is an integer here, unless I'm missing something... – GNUnit Jun 27 '11 at 13:52
  • I actually like comparing to `0`. For lists, the simple syntax makes it cleaner, but I dislike this syntax for numbers. But it's probably just me. – Konrad Borowski Apr 26 '14 at 19:34
4

If you prefer you can type if len(lst) > 0, but what would be the point? Python does use some conventions, one of which is that empty containers are considered falseish.

You can customize this with object.__nonzero__(self):

Called to implement truth value testing and the built-in operation bool(); should return False or True, or their integer equivalents 0 or 1. When this method is not defined, __len__() is called, if it is defined, and the object is considered true if its result is nonzero. If a class defines neither __len__() nor __nonzero__(), all its instances are considered true.

Nathaniel Jones
  • 939
  • 1
  • 14
  • 25
Jochen Ritzel
  • 104,512
  • 31
  • 200
  • 194
4

Implicit boolean polymorphism is a part of many languages. In fact, the idiom

if msg:
    print str(len(msg)) + ' char message: ' + msg

arguably comes from c!

void print_msg(char * msg) {
    if (msg) {
        printf("%d char message: %s", (int) strlen(msg), msg);
    }
}

An unallocated pointer should contain NULL, which conveniently evaluates to false, allowing us to avoid a segmentation fault.

This is such a fundamental concept that it would hamper python to reject it. In fact, given the fairly clumsy version of boolean polymorphism available in c, it's immediately clear (to me at least) that python has made this behavior much more explicit by allowing any type to have a customizable, well-defined boolean conversion via __nonzero__.

senderle
  • 145,869
  • 36
  • 209
  • 233
  • I don't do much C, but are you sure your statement is checking against null? I'd bet it's just checking if the binary representation is non-zero. i.e: if null is 0xdeadbeef on some architecture, this code is broken (or depends on implementation defined behavior of a certain environment). – GNUnit Jun 27 '11 at 14:00
  • Even Scala's implicit coercion makes more sense than this. This is a one off language feature to save literally 5-10 characters each instance. Haskell has none of this and gets away just fine, typically producing more terse and robust code than that of Python. Ruby gets away fine without this feature too IIRC. This happens in C because it's weakly typed (just like PHP, Perl, JS, shell, etc). Weak typing should universally be acknowledged as a harmful thing because it's completely arbitrary and you need to keep a giant map of (originalType,func,morestuff) -> coerced type in your head. – GNUnit Jun 27 '11 at 14:03
  • One case of weak typing in Python that's surprising is True == 1 and False == 0, but those are probably for backwards or forwards compatibility reasons. Now that I think of it, this idiom is most likely because of backwards compatibility (Python used to have no booleans). – GNUnit Jun 27 '11 at 14:04
  • @GNUnit, first, no. See [here](http://c-faq.com/null/ptrtest.html); `NULL` is always false in c. Second, your sentence beginning "Weak typing should..." offers a _single disadvantage_ of weak typing as justification for a statement of _universal harm_. I think that's poor reasoning. Certainly, all things being equal, stronger typing is better, but all things are rarely equal. Third, Python's type system is much stronger than c's, because every variable has a type, unlike in c, and furthermore the effect of every legal conversion is explicitly defined, unlike in c. – senderle Jun 27 '11 at 15:51
  • @GNUnit, more generally, duck typing isn't weak typing. Even Haskell supports polymorphism, so to say "Haskell has none of this," you're really going to have to define what you think "this" is. – senderle Jun 27 '11 at 15:59
  • @GNUnit, actually, I see that I am not quite correct about Python, in that if an object does not define `__nonzero__` or `__len__`, it is considered `True` by default. I actually think that's a (minor) flaw. – senderle Jun 27 '11 at 16:20
  • By "none of this" I mean implicit coercions (i.e: Haskell's "if" is just syntactic sugar for a function (Bool -> a -> a -> a), and passing an integer to such a function's first parameter fails to type check). I fully agree that C's type system is a lot weaker than Python's. Your self-correction kind of proves my point that implicit semantics are tricky. – GNUnit Jun 27 '11 at 17:33
  • But in precisely what sense does using `show` on a member of the `Show` typeclass not do implicit coercion? – senderle Jun 27 '11 at 18:30
  • Well this is more on the level of duck typing vs structural subtype polymorphism vs typeclasses. A typeclass is a mapping of constraints to a set of function signatures. An instance is a mapping of a type to a set of functions that satisfy that set of function signatures. These mappings unfortunately have no names in Haskell, they are imported no matter what when you import some module. The function (show :: (Show a) => a -> String) can be seen as having a compile time parameter that must be an instance of Show. Once again, unfortunately these have no names... – GNUnit Jun 27 '11 at 21:33
  • ...Now imagine you have a type T in module MT, an instance of Show for T in module TShow1, and an instance of Show for T in a module TShow2. Code that you use can import either TShow1 or TShow2 depending on what version of Show they want to use with T. When you call show t for some construction t of T, you are just telling it to use whatever instance you imported (TShow1 or TShow2). Unfortunately, this is a bit implicit, and for this sort of thing, newtypes are usually used instead which seems like a hack to me. In a visual language like subtext, this problem could be completely avoided. – GNUnit Jun 27 '11 at 21:36
  • But the reason this isn't very implicit is because when you use something that takes in an a type with an instance of Show, you are just telling it to use whatever implementation of show you have in scope for that type. – GNUnit Jun 27 '11 at 21:38
  • So to put it straightforwardly, something implicit happens when you call `show` on something of a type with a defined `Show` instance. Similarly in Python, something implicit happens when you use `if` on something with a defined `__nonzero__` (`__bool__` in python 3) method. So far, I don't see a problem in either case, because the implicit action is well-defined according to the general rules and idioms of either language. However, I admit that there's a valid quibble about Python _also_ trying to use `__len__`, and not throwing an error when neither method is defined. – senderle Jun 28 '11 at 15:28
  • @GNUnit, on the other hand, there remains one relevant but unmentioned line from the zen: "Although practicality beats purity." The behavior in question was there way back in 1.4, and it's a small wrinkle in the language, not a glaring fault, so it's not worth breaking a lot of old code to fix. – senderle Jun 28 '11 at 15:32
  • Oh, also I checked about Ruby. Actually, not only does Ruby's `if` accept any value without complaint, it considers `0` to be __true__! – senderle Jun 28 '11 at 16:17
  • Typeclasses don't need to be implicit, the way Haskell does them is implicit. Haskell chooses typeclasses implicitly as a tradeoff of modularity for being able to use a text based language more conveniently. i.e: every type T that derives Show is coupled to Show, which is ridiculous and should only be the case if there is an instance of show that relies on internal details of T. @ruby: ouch. I agree though that it's not worth fixing now. It just seems like there is way too much coolaid about this unsound idiom. This SO question confirmed my thoughts about the idiom. – GNUnit Jun 29 '11 at 15:05
2

Others have already noted that this is a logical extension of the tradition (at least as early as C, perhaps earlier) that zero is false in a Boolean context, and nonzero is true. Others have also already noted that in Python you can achieve some special effects with customized magic methods and such. (And this does directly answer your question "does it enable more use cases?") Others have already noted that experienced Python programmers will have learned this convention and be used to the convention, so it is perfectly obvious and explicit to them.

But honestly, if you don't like that convention, and you are just programming for your own enjoyment and benefit, simply don't follow it. It is much, much better to be explicit, clear, and redundant than it is to be obfuscated, and it's virtually always better to err on the side of redundancy than obfuscation.

That said, when it comes down to it, it's not a difficult convention to learn, and if you are coding as part of a group project, it is best to stick with the project's conventions, even if you disagree with them.

John Y
  • 14,123
  • 2
  • 48
  • 72
1

Empty lists evaluating to False and non-empty lists evaluating to True is such a central Python convention that it is explicit enough. If the context of the if-statement even communicates the fact that c is a list, len(c) == 0 wouldn't be much more explicit.

I think len(c) == 0 only is more explicit if you really want to check the length and not for magic zeroness or emptiness because their evaluation may differ from the length check.

Oben Sonne
  • 9,893
  • 2
  • 40
  • 61
  • `len(c) == 0` is explicit for complex objects because complex objects may have different notions of length, for example, in django, on a `QuerySet`, `len()` causes the entire query to run and be dumped to memory, and then counted, whereas `.count()` does a `COUNT()` query in SQL. There is no way for you to know what `not some_query_set` does without investigating, so the reader of your code now has to do extra research because you were too lazy to type `if len(queryset) == 0`. – GNUnit Jun 23 '11 at 19:58
  • Then again I suppose the idiom should be that "trueness" does whatever `__len__` does, but as you see, the complexity here is starting to stack up. – GNUnit Jun 23 '11 at 19:59
  • As for `QuerySet`s, which are more an interface to collections rather than actual collections, I agree that `not queryset` isn't very explicit. The docs recommend to use `queryset.exists()` to check if the collection provided by `queryset` is empty - so here that expression would be most explicit. But, as long as the docs of some class clearly communicate how instances are evaluated in a boolean context, I think `not someobject` is explicit enough. – Oben Sonne Jun 23 '11 at 20:23
  • Exactly, you have to go around reading docs to find out whether you can use some idiom that was supposed to save you 2.5 seconds. In lots of cases people don't even document whether you can use the idiom or not. One more reason why you can never know if your code is correct or not. – GNUnit Jun 23 '11 at 20:28
  • For regular Python types, the idiom is just perfect, everyone (should) know what it means. Outside standard Python, where you don't know how objects behave on magic methods, you either shouldn't use them or read the docs -- easy, isn't it? No code with a certain level of complexity can be so explicit that you don't need any docs. – Oben Sonne Jun 23 '11 at 20:43