37

If you run x = 'y' 'z' in Python, you get x set to 'yz', which means that some kind of string concatenation is occurring when Python sees multiple strings next to each other.

But what kind of concatenation is this? Is it actually running 'y' + 'z' or is it running ''.join('y','z') or something else?

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Merlin -they-them-
  • 2,731
  • 3
  • 22
  • 39
  • 10
    I think this is part of the lexer/parser. When Python parses the file and sees adjacent strings, it treats it as a single string. – univerio Oct 17 '14 at 20:42
  • 1
    If there is a difference between the `'x' + 'y'` and `''.join(..)` statements, you could try out if you get different results. What happens, for example, if you throw in a variable? – Jongware Oct 17 '14 at 20:44
  • You have a misconception that, when Python does things, it has to do them via constructs of Python. – Jim Balter Oct 18 '14 at 21:21

2 Answers2

58

The Python parser interprets that as one string. This is well documented in the Lexical Analysis documentation:

String literal concatenation

Multiple adjacent string literals (delimited by whitespace), possibly using different quoting conventions, are allowed, and their meaning is the same as their concatenation. Thus, "hello" 'world' is equivalent to "helloworld".

The compiled Python code sees just the one string object; you can see this by asking Python to produce an AST of such strings:

>>> import ast
>>> ast.dump(ast.parse("'hello' 'world'", mode='eval').body)
"Str(s='helloworld')"

In fact, it is the very act of building the AST that triggers the concatenation, as the parse tree is traversed, see the parsestrplus() function in the AST C source.

The feature is specifically aimed at reducing the need for backslashes; use it to break up a string across physical lines when still within a logical line:

print('Hello world!', 'This string is spans just one '
      'logical line but is broken across multiple physical '
      'source lines.')

Multiple physical lines can implicitly be joined into one physical line by using parentheses, square brackets or curly braces.

This string concatenation feature was copied from C, but Guido van Rossum is on record regretting adding it to Python. That post kicked of a long and very interesting thread, with a lot of support for removing the feature altogether.

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
  • +1 Doesn't this apply only to strings in the same line? – heltonbiker Oct 17 '14 at 20:45
  • 3
    @heltonbiker: it applies to strings on the same **logical** line. That can span multiple physical lines, if backslashes, parentheses or brackets are involved. – Martijn Pieters Oct 17 '14 at 20:48
  • 1
    Note that concatenation of adjacent strings in source is largely a C-family tradition. – Russell Borogove Oct 17 '14 at 21:07
  • 1
    @RussellBorogove: indeed, and since it doesn't offer the same advantages in Python as it does in C (for use in macros) it looks like its days in Python are numbered. – Martijn Pieters Oct 17 '14 at 21:17
  • I have been bitten by the problem mentioned by Guido too. Particularly in long lists where each element is on a separate line (`a = ['this', 'that' 'other']` for example). On the other hand, I do like being able to say `'foo %s' ' bar %d' % (a, b)` instead of `('foo %s' + ' bar %d') % (a, b)` (imagine all the strings are on different lines). – Alok-- Oct 17 '14 at 21:21
  • @Alok--: hence the interesting discussion on the Python-Ideas list. A new literal concatenation token (`...`) was proposed to handle the issue of precedence. There are a fair number of posts to read there, but it is worth it if this sort of thing interests you! – Martijn Pieters Oct 17 '14 at 21:22
  • Yeah, reading through the thread now (there goes my hour or more!). It's very interesting. Thanks. – Alok-- Oct 17 '14 at 21:25
  • 1
    Uhm. I personally don't think that literal concatenation will be removed; at least not in the next few years. If they wanted to remove it they should have done that when making python3.0. Right now, with a lot of people already fighting with backwards compatibility introducing an other break may make things worse. They may be removed in python4.0, whenever this will be released... – Bakuriu Oct 18 '14 at 09:02
  • 1
    @Bakuriu: I'm actually surprised at how strongly Guido was in support of removing it in that thread, given that a [full-on PEP to remove it from Python 3](http://legacy.python.org/dev/peps/pep-3126/) had already been rejected 6 years earlier. – Martijn Pieters Oct 18 '14 at 10:26
8

The strings are being concatenated by the python parser before anything is executed, so its not really like 'y' + 'z' or ''.join('y','z'), except that it has the same effect.

tdelaney
  • 73,364
  • 6
  • 83
  • 116