195

Inputting the command 0xbin() returns False:

>>> 0xbin()
False

Why does that happen? This syntax should have no meaning whatsoever. Functions cannot start with 0, there are no "i" and "n" in hex, and the bin function must have some arguments.

2023 update: this is soon expected to be a syntax error.

MattS
  • 1,701
  • 1
  • 14
  • 20
  • 16
    It takes arguments! `0xbin(013,37)` – nneonneo Jul 25 '18 at 15:34
  • 13
    @nneonneo and if you want to get `True` you can try `0xbin(11,)` with a single argument – MattS Jul 25 '18 at 16:09
  • 10
    `0xbin(013,37)` will also give you True ;) (in Python 2.7) – nneonneo Jul 26 '18 at 02:25
  • 2
    This is simply because the implementor of the lexer and parser is only concerned with obtaining the desired behaviors over nicely formatted code. The juxtaposition of `0xb` and `in` should be treated as an invalid token. – Kaz Jul 26 '18 at 17:20
  • 4
    Compare and contrast with `0xband()`. [Tokenizer](https://docs.python.org/3/library/tokenize.html) is greedy and takes 0xba as a token. – wim Jul 26 '18 at 22:40
  • 8
    @people who are voting to reopen: Please explain why this is not a dupe. If you convince me, I'll dupehammer reopen it. – Kevin Jul 27 '18 at 16:42
  • 2
    @Kevin Maybe the dupe target is like "Why isn't whitespace required sometimes between tokens?" and this question is like "What even are the tokens?" (I could re-open myself, but since I answered and there were many votes I won't, in case it is viewed as a COI) – Chris_Rands Aug 07 '18 at 09:42

4 Answers4

231

Python seems to interpret 0xbin() as 0xb in (), meaning is eleven in an empty tuple. The answer is no, therefore False.

YSelf
  • 2,646
  • 1
  • 14
  • 19
  • 19
    So apparently "in", "is" etc don't require spaces? First time I encountered this, but it makes sense as "<" and "==" don't require them as well. – MattS Jul 25 '18 at 14:22
  • 46
    Apparently yes. The [Python Reference](https://docs.python.org/3.6/reference/lexical_analysis.html#whitespace-between-tokens) says whitespace between tokens is only needed "if their concatenation could be interpreted as a different token". But I have only ever seen such code in [Code Golf](https://codegolf.stackexchange.com/). – YSelf Jul 25 '18 at 14:29
  • 7
    @MattS This is why valid python identifiers (and many other languages) only accept alpha or underscore for the first letter of the identifier then allow numeric afterwards. The actual implementation is fairly complicated because of full Unicode support, but the pure ASCII regex for an identifier would be: `r'[_a-zA-Z][_a-zA-Z0-9]*'` – Aaron Jul 25 '18 at 14:30
  • 8
    @Aaron: `[_[:alpha:]][_[:alnum:]]*` in regular expression languages that allow (Unicode) characters classes, i. e. not Python’s. ;-] – David Foerster Jul 25 '18 at 16:20
  • 8
    Wow, I thought this kind of parsing was only done in Fortran and BASIC. I can't believe a modern language does it. – Barmar Jul 25 '18 at 18:28
  • 1
    One of the reasons for the peculiar definition of a "preprocessing number" in the C language family is to prevent things like this; in those languages `0xbin` would be treated as a single token, even though it cannot be interpreted as a valid numeric literal. – zwol Jul 25 '18 at 23:44
  • 1
    @DavidFoerster You can just use [`regex`](https://pypi.org/project/regex/) instead of the built-in `re`. It provides the same API but adds a lot of features, including matching unicode properties etc. I hope that in a few releases that will replace the standard `re` module... (and I believe the author has this hope too, hence the high level of compatibility between the two). – Giacomo Alzetta Jul 26 '18 at 07:57
  • 4
    @Barmar python is pretty old. – RonJohn Jul 26 '18 at 13:46
  • 2
    @RonJohn I've been programming for 40 years, Python is less than 30 years old. As far as I'm concerned, it's a young whipper-snapper. – Barmar Jul 26 '18 at 15:06
  • 2
    @Barmar: Python [wants to remain](https://www.python.org/dev/peps/pep-3099) an [LL(1) language](https://en.wikipedia.org/wiki/LL_grammar). That doesn't really have anything to do with this example in particular, but it illustrates their ongoing desire for a "dumb" parser (and also provides a little in the way of explanation. TL;DR: They don't want Python to end up like Perl.). – Kevin Jul 26 '18 at 22:20
143

If you disassemble the code, you'll see that Yself's answer, which mentions that 0xbin() is interpreted as 0xb in (), is confirmed:

>>> import dis
>>> dis.dis('0xbin()')
  1           0 LOAD_CONST               0 (11)
              2 BUILD_TUPLE              0
              4 COMPARE_OP               6 (in)
              6 RETURN_VALUE
Bhargav Rao
  • 50,140
  • 28
  • 121
  • 140
Chris_Rands
  • 38,994
  • 14
  • 83
  • 119
64

You can use Python's own tokenizer to check!

import tokenize
import io
line = b'0xbin()'
print(' '.join(token.string for token in tokenize.tokenize(io.BytesIO(line).readline) if token.type!=59))

This prints the tokens in your string, separated by spaces. In this case, the result will be:

0xb in ( ) 

In other words, it returns False because the number 11 (0xb) is not in the empty tuple (()).

(Thanks to Roman Odaisky for suggesting the use of tokenize in the comments!)

EDIT: To explain the code a bit more thoroughly: the tokenize function expects input in a bit of a weird format, so io.BytesIO(line).readline is a function that turns a sequence of bytes into something tokenize can read. tokenize then tokenizes it and returns a series of namedtuples; we take the string representing each one and join them together with spaces. The type != 59 part is used to ignore the encoding specifier that would otherwise show up at the beginning.

Draconis
  • 3,209
  • 1
  • 19
  • 31
  • 7
    This is the best answer yet, the "dis" and "ast" answers obscure what is going on behind uncommon notations, this shows it clearly in normal python. – plugwash Jul 26 '18 at 18:03
54

You can use the AST module to get the abstract syntax tree of the expression:

>>> import ast
>>> m = ast.parse('0xbin()')
>>> ast.dump(m)
'Module(
    body=[Expr(
               value=Compare(left=Num(n=11),
                             ops=[In()],
                             comparators=[Tuple(elts=[],
                                                ctx=Load())
                                         ]
                            ))])'

See the abstract grammar for how to interpret the expression, but tl;dr: Num(n=11) is the 0xb part, and Tuple(elts=[], ...) hints towards an empty tuple rather than a function call.

Mark Amery
  • 143,130
  • 81
  • 406
  • 459
Pål GD
  • 1,021
  • 8
  • 25