2

I was wondering why Python 3.7 functions behave in a rather strange way. I think it's kinda weird and contradictory to the whole notion of hashability. Let me clarify what I encounter with a simple example code. Knowing that tuples are hashable, consider the following:

a = (-1, 20, 8)
b = (-1, 20, 8)
def f(x):
    return min(x), max(x)

Now let us examine:

>>> print(a is b, a.__hash__() == b.__hash__())
False True
>>> print((-1, 20, 8) is (-1, 20, 8))
True

This is odd enough, but I guess "naming" hashable objects make them something different (their id()'s change during variable definition). How about functions? Functions are hashable, right? Let's see:

>>> print(f(a) is f(b))
False
>>> print(id(f(a)) == id(f(b)), f(a).__hash__() == f(b).__hash__())
True True

Now this is the climax of my confusion. You should be surprised that even f(a) is f(a) is False. But how so? Don't you think this kind of behavior is incorrect and should be addressed and fixed by Python community?

  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/211524/discussion-on-question-by-ashkan-ranjbar-why-dont-functions-preserve-identity). – Samuel Liew Apr 13 '20 at 04:42

2 Answers2

0

You can't guarantee two identical calls are the same since functions are also objects in Python, thus they can maintain state. Yet even if you put state apart you shouldn't rely that is will evaluate True if the contents of two objects are the same.

There are cases in which Python will optimize the code to use the same object as a singleton but you should't assume anything on this.

255 is 255 returns True due to implementation details of CPython while 256 is 256 returns False. If care only for deep equality use ==. is is designed for object equality checks.

c = 40
def f(x):
    return c + x

a = 1
f(a)
# 41

c += 1
f(a)
# 42

f(a) is f(a)
# True

c += 500
f(a) is f(a)
# False

f(a) is f(a) can result in the same objects, for instance Python stores integers up to 255 as singletons so the first test returns True but when we are out of those optimizations (c += 500) each call will instantiate its own object to return and now f(a) is f(a) will return False.

beatsme
  • 91
  • 5
  • I might somehow misled you @rmo, but look at the returned values. Those are just tuples, they has to be tuples, right? So why aren't they behaving like they should be? – Ashkan Ranjbar Apr 13 '20 at 03:40
  • @AshkanRanjbar I've added some on how Python will create new instances every time you create some objects (and sometimes not). I'll get back to your question to see if I missed the point though, sorry. – beatsme Apr 13 '20 at 03:47
  • @AshkanRanjbar as far as I understood the point is the same with tuples, Python creates a new instance of a tuple every time you call for it, regardless of others existing with the same values. They hash the same yeah so you can use `==` safely but they're different objects still so `is` won't work as you imagined. – beatsme Apr 13 '20 at 03:52
  • Nah, not `==`, since `==` calls for `__equality__` or **value** of an object while `is` looks into `id()` of an object. – Ashkan Ranjbar Apr 13 '20 at 03:55
-1

is keyword in python compares if the operand are pointing to the same object. Python provides id() function to return a unique identifier for an object instance. So, a is b does not compare if objects contain the same value, it just return if a and b are the same object.

__hash__() function returns a value based on the content/value of the object.

>>> a = (-1, 20, 8)
>>> b = (-1, 20, 8)
>>> id(a)
2347044252768
>>> id(b)
2347044252336
>>> hash(a)
-3789721413161926883
>>> hash(b)
-3789721413161926883

Now the last question, f(a) is f(b) compares if the results returned by f(a) and f(b) points to the same object in memory. If your function return min(x), max(x) will return a new tuple containing the min and max of x. Therefore, print(f(a) is f(b)) is False

f(a).__hash__() == f(b).__hash__() is True because this actually compares hash of the resulting value, not the hash of the function as you think. If you want the hash of the function, you will do f.__hash__() or hash(f) since function in Python is just a callable object.

The only interesting part is print(id(f(a)) == id(f(b))) shows True. This is probably due to CPython expression bytecode optimizer.

If you do it separately, it returns False.

>>> c = f(a)
>>> d = f(b)
>>> print(id(f(a)) == id(f(b)))
True
>>> print(id(c) == id(d))
False

I'm not sure if it is a bug that should be fix, but it is an odd inconsistency. BTW, I'm using Python 3.7.2 on Windows 64-bit. The behavior might different on different Python version or implementation.

If you replace integer values with strings, the behavior also changes due to Python's string interning optimization.

Therefore, the lesson here is just like general guidelines in other language, avoid comparing object references/pointers if possible as you might be looking into some implementation details about how the objects are referenced, optimization and possible how its GC works.

Here's an interesting related article: Python Optimization: How it Can Make You a Better Programmer

sonofusion82
  • 200
  • 3
  • 1
    *This is probably due to CPython expression bytecode optimizer.* It's not. Bad guess. And that's not a bug, just CPython reusing heap space (id is based on memory location). – wim Apr 13 '20 at 04:21
  • Ok, but it is still kinda odd. So, CPython expression parser evaluates the left side of == discards the object, then evaluate the expression on the right side. I'd imagine the most common comparison method is the `object.__eq__(self, other)` where both objects should be available at the same time and it should never have the same memory location. That's where I guess why it is somehow optimizing the expression evaluation? – sonofusion82 Apr 13 '20 at 05:49