8

Why does use of a colon make difference to the result? And what should be the correct result?

# Not stored in a different location.
>>> id('123 4')== id('123 4')
True

# Also returns true
>>> x = '123 4'; y ='123 4'; id(x) == id(y)
True

But this same thing returns false.

>>> x = '123 4'
>>> y = '123 4'
>>> id(x) == id(y)
False

Same thing under function returns True

>>> def test():
...     x = '123 4';y='123 4'; print (id(x)==id(y))
...     a = '123 4'
...     b='123 4'
...     print (id(a)==id(b))
...
>>> test()
True
True
Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Ankit Sachdeva
  • 179
  • 1
  • 9
  • 1
    Does line x = '123 4'; y ='123 4'; id(x) == id(y) execute in single step, that's why it returned true? – Ankit Sachdeva Mar 05 '15 at 17:21
  • 2
    @61612 I don't think this a duplicate of the mentioned question. The question is about, why the rules of interning strings differ when you use newline and semicolon as statement separator. – luk32 Mar 05 '15 at 17:23
  • 1
    No, i didn't think as a duplicate. I am concerned about different result using semi colon in python. – Ankit Sachdeva Mar 05 '15 at 17:23
  • 2
    Agreed. This is a separate question and the answer—which, IMO, will be interesting—relates to Python's string interning. – David Wolever Mar 05 '15 at 17:24
  • 1
    I suspect the real issue here is: what is the return value of multiple statements separated by semicolons in the REPL? Edit: Actually, nevermind... still investigating this... – ArtOfWarfare Mar 05 '15 at 17:24
  • 4
    This is all just an implementation detail. `id(something) == id(something_else)` is only guaranteed to work if the two variables are specifically assigned the same object. A program fragment expecting equality as shown here has a name: A Bug. – tdelaney Mar 05 '15 at 17:29
  • 1
    @tdelaney Of course not. http://guilload.com/python-string-interning/ : Native string interning. – luk32 Mar 05 '15 at 17:33
  • @luk32 - you showed an implementation detail (and it works differently when executed outside of the interpreter as I show below). Its not the defined behavior. – tdelaney Mar 05 '15 at 17:35
  • @tdelaney Oh, my apologies, I misunderstood you. I thought you said the observed behaviour is a bug. My bad. However, your statement does not relate to question very much IMO. No one does this here. Unless you meant it as a general advice then, of course, you are absolutely right. – luk32 Mar 05 '15 at 17:38
  • @luk32 That's an interesting article you linked to. I'm reading through it now. I suspect the answer to this question lies somewhere within it. – ArtOfWarfare Mar 05 '15 at 17:41
  • @luk32 - I think my statement is very much on point. Improper use of `a is b` is a common bug. OP shows several lines of code that should never be used in a python program and I pointed that out. – tdelaney Mar 05 '15 at 17:50
  • @tdelaney: Yeah, but now there's the curious question of why the strings are interned if they're on the same line but not if they're on separate lines. – ArtOfWarfare Mar 05 '15 at 17:54
  • @ArtOfWarfare - I think its because those several semicolon separated statements are parsed as a single unit. Its the parser that does the internment of the items it just parsed. When running scripts, you get the same id within a module but different ids across modules. – tdelaney Mar 05 '15 at 17:59
  • ime sure somewhere in the grammar you have something like `STATEMENT : EXPRESSION | STATEMENT SEMICOLON EXPRESSION` – Joran Beasley Mar 05 '15 at 19:15

4 Answers4

9
>>> x="123 4";y="123 4"

Python is smart enough to recognize both variables get the same value (since they are interpreted in the same line) and so stores that value in the same memory location (that is, id(x) == id(y)).

However

>>> x="123 4"
>>> y="123 4"

Python is not smart enough to realize they are both the same value (since they are interpreted on separate lines) and so stores each in its own memory location (that is, id(x) != id(y)).

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Joran Beasley
  • 110,522
  • 12
  • 160
  • 179
  • 4
    Is this specified somewhere, or is this some kind of magical and undocumented behavior that someone went out of their way to add in the Python interpreter? – ArtOfWarfare Mar 05 '15 at 17:27
  • 1
    since python parses line by line it can tell ... in C/C++ it would be optimized out by the compiler ... if you did `y=x` then it would share the id ... its just in general how python allocates memory ... all of python is open source ... and Im sure you can find actual documentation somewhere (maybe only by looking at the source) – Joran Beasley Mar 05 '15 at 17:29
  • I think the string internment happens when the string is parsed. In the interpreter, that's line by line. The take away is not to rely on undefined behavior! – tdelaney Mar 05 '15 at 17:33
  • 6
    @ArtOfWarfare: It's a Python interpreter implementation-specific optimization see [_Python string interning_](http://stackoverflow.com/questions/15541404/python-string-interning). I don't know if it's documented anywhere, but the bottom line is you shouldn't count on string IDs as a way to determine if two strings are _identical_. – martineau Mar 05 '15 at 17:35
  • same thing I have added to my comments on this question. But couldn't find any significance of it. – Ankit Sachdeva Mar 05 '15 at 17:36
  • @ArtOfWarfare I doubt there is any guarantee of this and it may vary between python interpreters (but it works for mine) – Joran Beasley Mar 05 '15 at 17:39
  • I don't see how this answers the question at all. At best it just restates it. – arshajii Mar 05 '15 at 18:31
  • the question is `Why use of colon makes difference to result? ` this answers it imho ... but meh – Joran Beasley Mar 05 '15 at 18:33
  • @arshajii It is more than a restatement. If this was the OP, we'd be saying "You answered your own question", and "..is there a question?". If you want it more explicitly: Python is interpreted, so the one-liner can be optimised, two-liner is not. – OJFord Mar 05 '15 at 18:46
5

This is just an accident of how the interpreter is written. Doing the same thing in a script shows a different result. It looks to me like the string internment happens along compilation units.

(added stuff2.py to show multiple modules)

stuff2.py:

z = '123 4'

stuff.py:

x = '123 4';y='123 4';print id(x)==id(y)
x = '123 4'
y='123 4'
print id(x)==id(y)
import stuff2
print id(x)==id(stuff2.z)


$ python stuff.py
True
True
False
tdelaney
  • 73,364
  • 6
  • 83
  • 116
  • This seems bizarre to me - why would doing this within a script have a different behavior from doing it within the interpreter? This means that someone went out of their way to write an optimization which is only performed within the interpreter. – ArtOfWarfare Mar 05 '15 at 17:34
  • 4
    I don't see it that way. It seems like the string internment stuff operates across a parsing action, which is different in the interpreter which is running on small fragments and a script execution which is running on an entire file. Another interesting test would be what happens in multiple modules. I think I'll add that. – tdelaney Mar 05 '15 at 17:39
  • ahh better than my answer (I didnt know it was specific to running interactive) (+1 from me) – Joran Beasley Mar 05 '15 at 17:39
0

Well, as it is obvious, this is related to string interning in Python. This mechanism is implementation-dependent, so any Python interpreter such as CPython, IronPython, PyPy, etc. might behave differently. And it can change between versions. It probably even can change between the runs.

To solve your specific case one would need to analyse the source code of the given version of your interpreter. And the best bet is that there is a slight difference in implementation of handling the statements passed in one line (separated with semicolon) and executing them one by one.

When running interactively, you need to bear in mind, that quite much can happen between each line of code - because you might want to inspect it. When you pass everything at once, the interpreter does have a lot less to worry about what might happen between the statements.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
luk32
  • 15,812
  • 38
  • 62
0

If you assign an immutable object to a variable in python the variable as well as the value points to the same location,

>>> a = 5
>>> b = 5
>>> id(5)
11372376
>>> id(a)
11372376
>>> id(b)
11372376
>>> a == b
True
>>> a is b
True

ID Comparison will exactly work as you can see the ID values. Now lets try to assign mutable objects to python variables.

>>> x = '123 4'
>>> y = '123 4'
>>> x == y
True
>>> x is y
False
>>> id(x)
21598832
>>> id(y)
21599408
>>> id('123 4')
21599312

You can see the ID difference here. as 'is' compares the values with address location where as '==' compares with the reference value directly. However it doesn't gives an error in case of immutable objects as all points to same location, but in case of mutable as the values can change the variables are pointed to current object and hence is gives you false result.

Hope this helps :)

binu.py
  • 1,137
  • 2
  • 9
  • 20