11

Its been a couple of days since I started learning python, at which point I stumbled across the == and is. Coming from a java background I assumed == does a comparison by object id and is by value, however doing

 >>> a = (1,2)
 >>> b = (1,2)
 >>> a is b
 False
 >>> a == b
 True

Seems like is is equivalent of java's == and python's == is equivalent to java's equals(). Is this the right way to think about the difference between is and ==? Or is there a caveat?

fo_x86
  • 2,583
  • 1
  • 30
  • 41
  • 5
    One added quirk is that implementations are free to create objects in ways you might not expect. For example, in pypy 1.9.0, if you enter `a = (1,2); b = (1,2); print a is b` all on one line, you actually get `True`-- they *could* have been distinct objects, but as it happens there's no reason not to reuse the same one twice. – DSM Jan 03 '13 at 18:21
  • +1 for pointing out the different implementations part. – fo_x86 Jan 03 '13 at 18:28

4 Answers4

22
  • '==' checks for equality,
  • 'is' checks for identity

See also

Why does comparing strings in Python using either '==' or 'is' sometimes produce a different result?

Community
  • 1
  • 1
16

is checks that both operands are the same object. == calls __eq__() on the left operand, passing the right. Normally this method implements equality comparison, but it is possible to write a class that uses it for other purposes (but it never should).

Note that is and == will give the same results for certain objects (string literals, integers between -1 and 256 inclusive) on some implementations, but that does not mean that the operators should be considered substitutable in those situations.

Ignacio Vazquez-Abrams
  • 776,304
  • 153
  • 1,341
  • 1,358
2

To follow up on @CRUSADER's answer:

== checks the equality of the objects, using the eq method.

is checks the actual memory location of the objects. If they are the same memory location, they test as True

As was mentioned above, the first 2**8 integers are stored in memory locations for speed, so to see whats going on use some other object or integers above 256. For instance:


In [8]: a = 1001
In [9]: b = a # this sets a pointer to a for the variable b
In [10]: a == b 
Out[10]: True # of course they are equal
In [11]: a is b 
Out[11]: True # and they point to the same memory location
In [12]: id(a)
Out[12]: 14125728
In [13]: id(b)
Out[13]: 14125728

In [14]: b = 1001 #this instantiates a new object in memory In [15]: a == b Out[15]: True In [16]: a is b Out[16]: False #now the memory locations are different In [17]: id(a) Out[17]: 14125728 In [18]: id(b) Out[18]: 14125824

reptilicus
  • 10,290
  • 6
  • 55
  • 79
1

This is one of those situations where seemingly synonymous concepts might confuse newer programmers, such as I was when I first wrote this answer. You were close with your assumption based on Java, but backwards. The difference between these operators boils down to the matter of object equivalency vs. object identity, but contrary to what you assumed, == compares by value and is compares by object id. From cpython's built-in documentation (as obtained from typing help("is") at my interpreter's prompt, but also available online here):

Identity comparisons ====================

The operators "is" and "is not" test for object identity: "x is y" is true if and only if x and y are the same object. Object identity is determined using the "id()" function. "x is not y" yields the inverse truth value.

To break this down a bit for less experienced programmers (or really anyone that needs a refresher), a rough definition of each concept is given as follows:

  • object equivalency: two references are equivalent if they have the same effective value.

  • object identity: two references are identical if they refer to the same exact object, e.g. same memory location

object equivalency occurs in most of the situations that you might expect, such as if you compare 2 == 2 or [0, None, "Hello world!"] == [0, None, "Hello world!"]. For built-in types, this is usually determined based on the value of the object, but user-defined types can define their own behavior by defining the __eq__ method (though it is still advised to do so in a way that reflects the complete value of the object). Object identity is something that can lead to equivalence, but is, on the whole, a separate matter entirely. Object identity depends strictly on whether 2 objects (or rather, 2 references) refer to the exact same object in memory, as determined by id(). Some useful notes about identical references: because they refer to the same entity in memory, they will ALWAYS (at least in cpython) have the same value and, unless __eq__ was defined unconventionally, will therefore be equivalent. This even holds if you attempt to change one of the references through an in-place operation, such as list.append() or my_object[0]=6, and care should be taken to test identity and make copies of objects that should be separate (this is one of the main purposes of is: detecting and dealing with aliases). For example:

>>> first_object = [1, 2, 3]
>>> aliased_object = first_object
>>> first_object is aliased_object
True
>>> aliased_object[0]= "this affects first_object"
>>> first_object
['this affects first_object', 2, 3]
>>> copied_object= first_object.copy() #there are other ways to do this, such as slice notation or the copy module, but this is the most simple and direct
>>> first_object is copied_object
False
>>> copied_object[2] = "this DOES NOT affect first_object"
>>> first_object
['this affects first_object', 2, 3]
>>> copied_object
['this affects first_object', 2, "this DOES NOT affect first_object"]

There are a lot of situations that can result in 2 references being aliased, but outside the assignment operator (which always creates a reference to the assigned object, as above), many of them depend on the exact implementation, e.g. not every implementation of Python will intern strings under the same circumstances or preemptively cache (I'm not sure what the proper term is in this case) the same range of integers. My installation of cpython, for instance, seems to have cached -8 on startup when this article seems to imply that's out of the normal range. Thus, even if is seems to work in your dev environment, it's better to be on the same side, avoid inconsistent behavior altogether, and use ==. is should be reserved for situations where you actually want to compare identity.