6

I am new to Python and I have to say the way Python treats variable assignment and function arguments is very confusing. Here is something I don't understand. If I define two strings with explicitly the same content like 'abc', then they are actually the same object, as shown below.

x = 'abc'
y = 'abc'
x is y
True

This gets me thinking how Python knows they are the same. By comparing the literals in the code? If there are million different things happening between x = 'abc' and y = 'abc', does Python go back all the way and say there was an object 'abc' already so I am not going to create a new 'abc'?

I wondered what would happen if I do the same but with a really long and complex string. This is what happened.

x = 'nao;uh gahasjhd;fjkhag;sjdgfuiwgfashksghdfaihghehwq3473fsd_@'
y = 'nao;uh gahasjhd;fjkhag;sjdgfuiwgfashksghdfaihghehwq3473fsd_@'
x is y
False
x == y
True

Now you can see why I am confused. So Python only check for "simple" strings but not long strings when creating new string objects? Then how complex/long is too complex/long?

Frank
  • 79
  • 5
  • 1
    The details of this are implementation-specific. Python allows the implementation to share instances for any values it knows to be immutable, but never requires it, so any code that relies on either `x is y` or `x is not y` being true here is broken, or at best nonportable. If you want to know the internal implementation details of a specific implementation, like CPython 3.7, or CPython 2.7 or PyPy 2.1/3.5, or if you want to know generally what means a Python implementation _could_ use here, those are really all separate questions. – abarnert Jul 06 '18 at 17:58

1 Answers1

4

There is difference in x is y and x==y
x is y will check if x and y are pointing to same object in heap or not.
While x == y will check if value of x and y is same or not.

Now lets see why you got that two different result

If length of value is small (at-most 3 to 4 digit) only then python check if there is another object present in heap with same value or not.If present then it does not create new object ,and if not present it creates new object.
If length of value is Big(more than 4 digit) python creates new object it does not check if object with same value is already present or not.

When length of string,int,float is small

In python when two variables are having same string,int or float value and if the length of value is small then both the variables point to same object i.e. only one object is creted in heap memory.
let's try on your own by taking this example.

a=10
b=10
a is b
True     #output
a == b
True     #output  

Here a is b checks if a and b are refering(i.e. pointing) to same object in heap or not.
Since 10 is of length 2 which is small python interpreter will create object value 10 only once and that's why a and b will refer to same object and output is True

And a == b will check if value of a is equal to value of b.
Since value of a is 10 and value of b is also 10 so output is True

You can try with string value as well e.g

s1='abc'
s2='abc'
s1 is s2
True      #output
s1 == s2
True      #output  

When length of string,int,float is big

Now when length of string,int,float is big then python interpreter does not check if there is object with same value in heap, it directly creates new object even if there was object present with same value

x = 'nao;uh gahasjhd;fjkhag;sjdgfuiwgfashksghdfaihghehwq3473fsd_@'
y = 'nao;uh gahasjhd;fjkhag;sjdgfuiwgfashksghdfaihghehwq3473fsd_@'
x is y
False
x == y
True

Here since nao;uh gahasjhd;fjkhag;sjdgfuiwgfashksghdfaihghehwq3473fsd_@ is too long python interpreter will not check if there is another object with same value present in in heap ,it will create an object an x will refer to that object.
Now again nao;uh gahasjhd;fjkhag;sjdgfuiwgfashksghdfaihghehwq3473fsd_@ is assigned to y but since length of string is large python interpreter will create new object(even tough that object is already present)
Now since x and y are refering to different object output will be False
And since x and y are having same value output will be True

You can try this e.g also

a = 10000
b = 10000
a is b
False      #output
a == b
True      #output  

Why Python does this

Python does it to reduce the Interpretation time(i.e. execution time for code) If python keeps checking whether that long string(e.g whose length is 10) are already present in heap or not.It will require more time because it will compare letter by letter with all the objects.Comparing 10 digit will consume lot of time.
While if length of string is less than 4 than it is easy to compare(because only 3 letter are needed to compare) and it will not take much time.

swapnil
  • 92
  • 5
  • Thanks for taking the time to explain it. Based on the replies I got, I now understand it's a matter of string interning. But whether a string is interned or not is not just based on if it has length 4 or less, it has something to to with length, type of strings, and maybe something else. And as one of the answer pointed out, it's implementation-specific. – Frank Jul 06 '18 at 19:31
  • No problem .Important thing is you have understood – swapnil Jul 06 '18 at 19:34
  • String too long is not why... I still do not know... I will post the counter example below If you replace all special caracters for letters or number (in this case, X) they are the same object, even with long strings... x = 'naoXuhXgahasjhdXfjkhagXsjdgfuiwgfashksghdfaihghehwq3473fsdXX' y = 'naoXuhXgahasjhdXfjkhagXsjdgfuiwgfashksghdfaihghehwq3473fsdXX' print(x is y) # gives True – famaral42 Aug 13 '20 at 14:59