54

Noticed a line in our codebase today which I thought surely would have failed the build with syntax error, but tests were passing so apparently it was actually valid python (in both 2.x and 3).

Whitespace is sometimes not required in the conditional expression:

>>> 1if True else 0
1

It doesn't work if the LHS is a variable:

>>> x = 1
>>> xif True else 0
  File "<stdin>", line 1
    xif True else 0
           ^
SyntaxError: invalid syntax

But it does seem to still work with other types of literals:

>>> {'hello'}if False else 'potato'
'potato'

What's going on here, is it intentionally part of the grammar for some reason? Is this odd quirk a known/documented behaviour?

wim
  • 338,267
  • 99
  • 616
  • 750
  • 20
    Python names cannot start with digits, so that's one reason. The parser knows that `if` is a new token. – Martijn Pieters Jun 02 '14 at 15:23
  • 1
    @MartijnPieters That's true, but that doesn't really explain what exactly is happening. – Brandon Buck Jun 02 '14 at 15:28
  • 2
    @BrandonBuck: Poke found the relevant portion of the reference docs already, but that's exactly what is happening here. – Martijn Pieters Jun 02 '14 at 15:30
  • Note: If the code is harder to understand, it's also helpful to [tokenize](https://stackoverflow.com/a/51530110/5267751), [get the abstract syntax tree](https://stackoverflow.com/a/51525049/5267751), or [disassemble](https://stackoverflow.com/a/51521261/5267751) the code. – user202729 Nov 20 '18 at 05:21

3 Answers3

66

Whitespace between tokens

Except at the beginning of a logical line or in string literals, the whitespace characters space, tab and formfeed can be used interchangeably to separate tokens. Whitespace is needed between two tokens only if their concatenation could otherwise be interpreted as a different token (e.g., ab is one token, but a b is two tokens).

So in this case, 1if is not a valid token, so the whitespace is optional. The 1 is interpreted as an integer literal of which the if is not a part. So if is interpreted separately and recognized as a keyword.

In xif however, an identifier is recognized, so Python is not able to see that you wanted to do x if there.

poke
  • 369,085
  • 72
  • 557
  • 602
  • 4
    By that reasoning, shouldn't `1 if 1else 0` parse? (it doesn't) – wim Jun 02 '14 at 15:34
  • 33
    @wim I was confused for a moment and thought it should parse as well, but `1e` is actually interpreted as the start of a literal in scientific notation; e.g. `1e3 == 1000` – l4mpi Jun 02 '14 at 15:37
  • 7
    Interesting! After reading your comment, I have noticed that `1 if 1jelse 0` _does_ parse. – wim Jun 02 '14 at 15:39
  • 4
    @wim: or `1 if 0b1else 0`. The fact that `1else` doesn't parse shows that the rule cited in this answer is not entirely consistent with the implementation, because `1else` could not otherwise be interpreted as a different token, and neither can `1e`. (`0x1else` doesn't parse either, but that's because the maximal munch rule makes it into `0x1e` `lse`, both of which are valid.) – rici Jun 02 '14 at 17:41
  • I'm pretty sure this is a bug. I'll make an issue on the tracker. – Veedrac Jun 02 '14 at 18:16
  • That bug was now fixed, so it should work in every next release. – poke Jun 11 '14 at 13:38
4

The Python lexer generates two tokens for the input 1if: the integer 1 and the keyword if, since no token that begins with a digit can contain the string if. xif, on the other hand, is recognized as a valid identifier; there is no reason to believe that it is an identifier followed by a keyword, and so is passed to the parser as a single token.

chepner
  • 497,756
  • 71
  • 530
  • 681
3

With my limited knowledge of lexical processing and tokenizing I'd say what you're seeing is that any piece that can be lexical parsed as "different" (i.e. numbers/dictionaries, etc...) from the if are being done so. Most languages ignore spaces and I imagine that Python does the same (excluding, of course indentation levels). Once tokens are generated the grammar itself doesn't care, it most likely looks for an [EXPRESSION] [IF] [EXPRESSION] [ELSE] [EXPRESSION] grouping, which, again with your examples, would work fine.

Brandon Buck
  • 7,177
  • 2
  • 30
  • 51