118

Why does Python 3 allow "00" as a literal for 0 but not allow "01" as a literal for 1? Is there a good reason? This inconsistency baffles me. (And we're talking about Python 3, which purposely broke backward compatibility in order to achieve goals like consistency.)

For example:

>>> from datetime import time
>>> time(16, 00)
datetime.time(16, 0)
>>> time(16, 01)
  File "<stdin>", line 1
    time(16, 01)
              ^
SyntaxError: invalid token
>>>
smci
  • 32,567
  • 20
  • 113
  • 146
walrus
  • 2,945
  • 5
  • 18
  • 19

3 Answers3

108

Per https://docs.python.org/3/reference/lexical_analysis.html#integer-literals:

Integer literals are described by the following lexical definitions:

integer        ::=  decimalinteger | octinteger | hexinteger | bininteger
decimalinteger ::=  nonzerodigit digit* | "0"+
nonzerodigit   ::=  "1"..."9"
digit          ::=  "0"..."9"
octinteger     ::=  "0" ("o" | "O") octdigit+
hexinteger     ::=  "0" ("x" | "X") hexdigit+
bininteger     ::=  "0" ("b" | "B") bindigit+
octdigit       ::=  "0"..."7"
hexdigit       ::=  digit | "a"..."f" | "A"..."F"
bindigit       ::=  "0" | "1"

There is no limit for the length of integer literals apart from what can be stored in available memory.

Note that leading zeros in a non-zero decimal number are not allowed. This is for disambiguation with C-style octal literals, which Python used before version 3.0.

As noted here, leading zeros in a non-zero decimal number are not allowed. "0"+ is legal as a very special case, which wasn't present in Python 2:

integer        ::=  decimalinteger | octinteger | hexinteger | bininteger
decimalinteger ::=  nonzerodigit digit* | "0"
octinteger     ::=  "0" ("o" | "O") octdigit+ | "0" octdigit+

SVN commit r55866 implemented PEP 3127 in the tokenizer, which forbids the old 0<octal> numbers. However, curiously, it also adds this note:

/* in any case, allow '0' as a literal */

with a special nonzero flag that only throws a SyntaxError if the following sequence of digits contains a nonzero digit.

This is odd because PEP 3127 does not allow this case:

This PEP proposes that the ability to specify an octal number by using a leading zero will be removed from the language in Python 3.0 (and the Python 3.0 preview mode of 2.6), and that a SyntaxError will be raised whenever a leading "0" is immediately followed by another digit.

(emphasis mine)

So, the fact that multiple zeros are allowed is technically violating the PEP, and was basically implemented as a special case by Georg Brandl. He made the corresponding documentation change to note that "0"+ was a valid case for decimalinteger (previously that had been covered under octinteger).

We'll probably never know exactly why Georg chose to make "0"+ valid - it may forever remain an odd corner case in Python.


UPDATE [28 Jul 2015]: This question led to a lively discussion thread on python-ideas in which Georg chimed in:

Steven D'Aprano wrote:

Why was it defined that way? [...] Why would we write 0000 to get zero?

I could tell you, but then I'd have to kill you.

Georg

Later on, the thread spawned this bug report aiming to get rid of this special case. Here, Georg says:

I don't recall the reason for this deliberate change (as seen from the docs change).

I'm unable to come up with a good reason for this change now [...]

and thus we have it: the precise reason behind this inconsistency is lost to time.

Finally, note that the bug report was rejected: leading zeros will continue to be accepted only on zero integers for the rest of Python 3.x.

nneonneo
  • 171,345
  • 36
  • 312
  • 383
  • 6
    Why do you say "We'll probably never know exactly why Georg chose to..."? If someone that knows him sees this thread and informs him about it, then he might come give his answer! (unless you know he's forevermore refusing to discuss his past Python work, or some similar circumstance) – walrus Jul 16 '15 at 09:29
  • 1
    I don't understand why they didn't just make the second Python 2 `octinteger` case `"0" octdigit*`. `0` is an octal literal in C/C++. – Random832 Jul 16 '15 at 12:40
  • 1
    Actually English is a bit ambiguous in this regard. The word "another" can mean "one more" or it can mean "a different one." One valid English interpretation of the bolded quote from PEP 3127 is to mean "a SyntaxError will be raised whenever a leading '0' is immediately followed by a digit other than '0'" I'm not sure if that's what was actually intended (although that interpretation does appear to be supported by the actual code), but in any case I don't think it's accurate to say that the PEP is technically violated without additional clarification of that sentence. – GrandOpener Jul 16 '15 at 23:40
  • 2
    @GrandOpener: Note that `001` is illegal, whereas your interpretation would render that legal (since the meaning of "immediately" should be quite unambiguous). – nneonneo Jul 17 '15 at 02:33
  • Good point. So the PEP is definitely violated; what is ambiguous is the exact nature in which it is violated. :) – GrandOpener Jul 28 '15 at 21:20
  • @walrus: Georg [spoke up](https://bugs.python.org/issue24668#msg246945)! Unfortunately, he only said that he does not recall the reason for the change, nor can he come up with a good reason for the change now. So I guess we will never know why he did it. – nneonneo Jul 28 '15 at 22:07
17

It's a special case ("0"+)

2.4.4. Integer literals

Integer literals are described by the following lexical definitions:

integer        ::=  decimalinteger | octinteger | hexinteger | bininteger
decimalinteger ::=  nonzerodigit digit* | "0"+
nonzerodigit   ::=  "1"..."9"
digit          ::=  "0"..."9"
octinteger     ::=  "0" ("o" | "O") octdigit+
hexinteger     ::=  "0" ("x" | "X") hexdigit+
bininteger     ::=  "0" ("b" | "B") bindigit+
octdigit       ::=  "0"..."7"
hexdigit       ::=  digit | "a"..."f" | "A"..."F"
bindigit       ::=  "0" | "1"

If you look at the grammar, it's easy to see that 0 need a special case. I'm not sure why the '+' is considered necessary there though. Time to dig through the dev mailing list...


Interesting to note that in Python2, more than one 0 was parsed as an octinteger (the end result is still 0 though)

decimalinteger ::=  nonzerodigit digit* | "0"
octinteger     ::=  "0" ("o" | "O") octdigit+ | "0" octdigit+
Community
  • 1
  • 1
John La Rooy
  • 295,403
  • 53
  • 369
  • 502
2

Python2 used the leading zero to specify octal numbers:

>>> 010
8

To avoid this (misleading?) behaviour, Python3 requires explicit prefixes 0b, 0o, 0x:

>>> 0o10
8
dlask
  • 8,776
  • 1
  • 26
  • 30
  • 16
    The question remains: why is `00` allowed? (And `000`, `0000`, etc.) – Michael Geary Jul 16 '15 at 07:29
  • 6
    @MichaelGeary: possibly because it can't be ambiguous (00000000 is 0 regardless of the base) and removing it would needlessly break code? Still strange. – RemcoGerlich Jul 16 '15 at 07:55
  • 6
    @RemcoGerlich If I'm not wrong, `01` is also `1` regardless of the base. – Holt Jul 16 '15 at 07:58
  • 2
    @Holt: but allowing "0"+"1"? as a special case would probably be even more confusing. – RemcoGerlich Jul 16 '15 at 08:05
  • 4
    @RemcoGerlich Never said it wouldn't ;) I was just saying that the `can't be ambiguous` is not an argument since `01` can't be ambiguous either. IMO, the `00` case is just a special case because it is `0` which should not be. – Holt Jul 16 '15 at 08:11
  • @Holt `0` is a special case too. If `00` weren't allowed it would be even more special. – Random832 Jul 16 '15 at 12:39
  • 2
    @Holt "01" might be ambiguous regardless of base but "011" or "010" is not. Basically, any number of zeroes will always mean a null quantity. Any other number will be ambiguous. The special case of "01" probably doesn't warrant an exception. "0"+ might. – Zac Crites Jul 16 '15 at 16:54
  • 2
    Any number of zeroes followed by a single other digit still means the same in any number system which has this other digit. So they even could allow `000007` without introducing any ambiguity. – Paŭlo Ebermann Jul 16 '15 at 20:42