python, x = """"""; lexed as triple quotes or 3 pairs of quotes

Question

In python you can say this:

x = """""" # x = ''

Does the Python lexer see this as two triple quotes with nothing inside? I.e. along the lines of x = """ """ (with no space)?

This was my immediate thought. However, this is possible in python:

>>> "4" "5"
'45'
>>> # and
>>> "4""5"
'45'

So I can see that x = """""" might also be lexed along the lines of x = "" "" "" (with no spaces). I'm just wondering, is """""" lexed as two triple quotes or three pairs of normal quotes? Or something else entirely? Thanks!

EDIT: Obviously, it doesn't matter as a programmer in Python. However, the Python interpreter definitely must pick one of these and I'm wondering which.

@IfLoop Fixed that, sorry, thought I was using the right terminology. — eatonphil, Jul 30 '14 at 13:44
i wonder http://stackoverflow.com/questions/7696924/multiline-comments-in-python is what you looking for — Selva, Jul 30 '14 at 13:44
@Selva, nope I don't see how that's relevant. This has nothing to do with comments. — eatonphil, Jul 30 '14 at 13:45
There would be no difference between the two interpretations - three empty strings is still an empty string `"" * 3 == """""" == "" "" "" == ""`. For any non-empty example it is clear to see which is being used, although they're still equal: `"f""o""o" == """foo""" == "foo"`. — jonrsharpe, Jul 30 '14 at 13:46
@Wooble, maybe it doesn't. But I'm positive Python interprets this deterministically... and I'm just wondering how — eatonphil, Jul 30 '14 at 13:46
@jonrsharpe, it doesn't matter that there is no difference. The Python interpreter definitely picks one, and I'm wondering which it picks. — eatonphil, Jul 30 '14 at 13:47
The explanation of triple quotes is not far from the start of the tutorial: [Strings](https://docs.python.org/3/tutorial/introduction.html#strings). More details can be found in the [Lexical Analysis](https://docs.python.org/3/reference/lexical_analysis.html). — Matthias, Jul 30 '14 at 13:47
The compiler picks one. The bytecode generated for `""""""` and `"" "" ""` both contains `LOAD_CONST 1 ('')`; by the time it's interpreted they're identical. — Wooble, Jul 30 '14 at 13:48
@phileaton why doesn't it matter? It wouldn't make any difference if the *interpreter* did the same thing every time, picked one randomly or selected based on the phase of the moon. This is an *implementation detail* and makes no difference to the outcome. — jonrsharpe, Jul 30 '14 at 13:51

kojiro · Accepted Answer · 2014-07-30T13:56:30.550

You can tell by using the tokenizer:

>>> from StringIO import StringIO
>>> from tokenize import generate_tokens as gt
>>> from pprint import pprint as pp
>>> code = 'x=""""""'
>>> codeio = StringIO(code)
>>> tokens = list(gt(codeio.readline))
>>> pp(tokens)
[(1, 'x', (1, 0), (1, 1), 'x=""""""'),
 (51, '=', (1, 1), (1, 2), 'x=""""""'),
 (3, '""""""', (1, 2), (1, 8), 'x=""""""'),
 (0, '', (2, 0), (2, 0), '')]

The first token is 'x'. the second is '=' and the third is '""""""'. There are not three '""' tokens.

P.S. for comparison:

>>> othercode='y="led" "zeppelin"'
>>> othercodeio = StringIO(othercode)
>>> othertokens = list(gt(othercodeio.readline))
>>> pp(othertokens)
[(1, 'y', (1, 0), (1, 1), 'y="led" "zeppelin"'),
 (51, '=', (1, 1), (1, 2), 'y="led" "zeppelin"'),
 (3, '"led"', (1, 2), (1, 7), 'y="led" "zeppelin"'),
 (3, '"zeppelin"', (1, 8), (1, 18), 'y="led" "zeppelin"'),
 (0, '', (2, 0), (2, 0), '')]

score 1 · Answer 2 · answered Jul 30 '14 at 13:48

its lexically a single string. triple quoted strings are the only form that can span multiple lines (unlike some other languages, which allow all strings, or no strings to span lines).

this particular syntax was probably selected since it makes syntax highlighting simple, flag matched pairs of quotes. Although this can still erroneously highlight invalid python (single quoted strings that span lines), it's good enough for text editors, usually.

the parser, when it reads a quote, checks for two more of the same type of quote, and if it finds them, terminates the string only on three more consecutive quotes. Otherwise it terminates the string on the next quote, unless there's a newline first, in which case it produces an error.

Anshul Goyal · Answer 3 · 2014-07-30T14:00:45.020

0

It is identified as a pair of triple quotes only.

Check this

>>> id("""""")
140579203310856

>>> id("")
140579203310856

>>> id("" "" "")
140579203310856

which basically means, the pair of triple quotes are identified the same as a normal quote.

Also, if you do an id of two double quotes, as follows

>>> id("""")
....

it won't terminate since the lexer is now treating it as a docstring and is expecting the string to terminate validly.

edited Jul 30 '14 at 14:00

answered Jul 30 '14 at 13:53

Anshul Goyal

73,278
37
149
186

"and is expecting another two quotes for the string to terminate validly".. Nope, it's expecting 3 consecutive quotes for the string to terminate validly. The 3 quotes to end the string have to all come at once, you can't pay by instalments. – user9876 Jul 30 '14 at 13:54
@user9876 Ok, we are both wrong here. I just tried. 3 quotes won't work either, since there was a 4th quote character initially which was entered unescaped. Trying to figure this out now. – Anshul Goyal Jul 30 '14 at 13:58
This doesn't really prove anything. Did you try `id("" "" "")`? – Mark Dickinson Jul 30 '14 at 13:59
@user9876 Turns out, it is 4 quotes. `id("""")""""` evaluates with error `SyntaxError: EOL while scanning string literal` – Anshul Goyal Jul 30 '14 at 14:01
So is `"" "" ""` also "identified as a pair of triple quotes"? :-) By the time you get to the level of a Python object (after lexing, parsing to concrete syntax tree, conversion to abstract syntax tree, compilation to bytecode, interpretation of that bytecode), the distinction between `"" "" ""`, `""` and `""""""` is long gone. You can't say anything about how the *tokenizer* treats `""""""` from looking at ids of generated Python objects! – Mark Dickinson Jul 30 '14 at 14:03
@MarkDickinson Got your point :) Initially when I read the question, I thought OP was asking whether they are the same or not, and answered accordingly. I think the question has been suitably edited. – Anshul Goyal Jul 30 '14 at 14:05

score -1 · Answer 4 · answered Jul 30 '14 at 13:57

-1

if you have no text, it is the same. try the following:

>>> """abc"""
'abc'
>>> "a""b""c"
'abc'
>>> "a""b""c" == """abc"""
True

answered Jul 30 '14 at 13:57

user1438233

1,153
1
14
30

score -2 · Answer 5 · answered Jul 30 '14 at 13:51

-2

You can use the ast module to see the syntax tree python uses for this. e.g:

>>> import ast
>>> source = '""""""""'
>>> node = ast.parse(source, mode='eval')
>>> ast.dump(node)
"Expression(body=Str(s=''))"

As you can see, it's an empty string.

answered Jul 30 '14 at 13:51

mkriheli

1,788
10
18

python, x = """"""; lexed as triple quotes or 3 pairs of quotes

5 Answers5