This is one of those situations where seemingly synonymous concepts might confuse newer programmers, such as I was when I first wrote this answer. You were close with your assumption based on Java, but backwards. The difference between these operators boils down to the matter of object equivalency vs. object identity, but contrary to what you assumed, ==
compares by value and is
compares by object id. From cpython's built-in documentation (as obtained from typing help("is")
at my interpreter's prompt, but also available online here):
Identity comparisons
====================
The operators "is" and "is not" test for object identity: "x is y" is
true if and only if x and y are the same object. Object identity
is determined using the "id()" function. "x is not y" yields the
inverse truth value.
To break this down a bit for less experienced programmers (or really anyone that needs a refresher), a rough definition of each concept is given as follows:
object equivalency: two references are equivalent if they have the
same effective value.
object identity: two references are identical if they refer to the same exact object, e.g. same memory location
object equivalency occurs in most of the situations that you might expect, such as if you compare 2 == 2
or [0, None, "Hello world!"] == [0, None, "Hello world!"]
. For built-in types, this is usually determined based on the value of the object, but user-defined types can define their own behavior by defining the __eq__
method (though it is still advised to do so in a way that reflects the complete value of the object).
Object identity is something that can lead to equivalence, but is, on the whole, a separate matter entirely. Object identity depends strictly on whether 2 objects (or rather, 2 references) refer to the exact same object in memory, as determined by id()
. Some useful notes about identical references: because they refer to the same entity in memory, they will ALWAYS (at least in cpython) have the same value and, unless __eq__
was defined unconventionally, will therefore be equivalent. This even holds if you attempt to change one of the references through an in-place operation, such as list.append()
or my_object[0]=6
, and care should be taken to test identity and make copies of objects that should be separate (this is one of the main purposes of is
: detecting and dealing with aliases). For example:
>>> first_object = [1, 2, 3]
>>> aliased_object = first_object
>>> first_object is aliased_object
True
>>> aliased_object[0]= "this affects first_object"
>>> first_object
['this affects first_object', 2, 3]
>>> copied_object= first_object.copy() #there are other ways to do this, such as slice notation or the copy module, but this is the most simple and direct
>>> first_object is copied_object
False
>>> copied_object[2] = "this DOES NOT affect first_object"
>>> first_object
['this affects first_object', 2, 3]
>>> copied_object
['this affects first_object', 2, "this DOES NOT affect first_object"]
There are a lot of situations that can result in 2 references being aliased, but outside the assignment operator (which always creates a reference to the assigned object, as above), many of them depend on the exact implementation, e.g. not every implementation of Python will intern strings under the same circumstances or preemptively cache (I'm not sure what the proper term is in this case) the same range of integers. My installation of cpython, for instance, seems to have cached -8 on startup when this article seems to imply that's out of the normal range. Thus, even if is
seems to work in your dev environment, it's better to be on the same side, avoid inconsistent behavior altogether, and use ==
. is
should be reserved for situations where you actually want to compare identity.