62

I recently tried the following commands in Python:

>>> {lambda x: 1: 'a'}
{<function __main__.<lambda>>: 'a'}

>>> def p(x): return 1
>>> {p: 'a'}
{<function __main__.p>: 'a'}

The success of both dict creations indicates that both lambda and regular functions are hashable. (Something like {[]: 'a'} fails with TypeError: unhashable type: 'list').

The hash is apparently not necessarily the ID of the function:

>>> m = lambda x: 1
>>> id(m)
140643045241584
>>> hash(m)
8790190327599
>>> m.__hash__()
8790190327599

The last command shows that the __hash__ method is explicitly defined for lambdas, i.e., this is not some automagical thing Python computes based on the type.

What is the motivation behind making functions hashable? For a bonus, what is the hash of a function?

Mad Physicist
  • 107,652
  • 25
  • 181
  • 264
  • 7
    I really feel like this is the sort of question that you shouldn't consider before you have a *good* answer to "why wouldn't you?" –  Jul 22 '16 at 11:24
  • 3
    @Hurkyl. Because it requires the addition of extra maintenance burden for one thing. Someone had to design and write the `__hash__` function, so they clearly saw benefit in it. I would like to know what made them not just leave it alone. – Mad Physicist Jul 22 '16 at 13:21
  • Although given the answers, it seems that disabling the hash function would require more work than just inheriting it from object. So in fact one of the considerations may have been reduction of maintenance debt. – Mad Physicist Jul 22 '16 at 16:16
  • 2
    @MadPhysicist, the code is trivial either way. In `C`, an object's type is represented by a `struct PyTypeObject`, and that struct's `tp_hash` member defines what happens for `__hash__()`. If that member is initialized to 0, it inherits `__hash__()`. If you want the type to refuse to hash, you just initialize it to `PyObject_HashNotImplemented` instead. (And if the type wants to implement a custom hash, you plug in the address of the type's custom hash function.) So maintenance burden really has nothing to do with it. – Tim Peters Jul 22 '16 at 17:41

3 Answers3

61

It's nothing special. As you can see if you examine the unbound __hash__ method of the function type:

>>> def f(): pass
...
>>> type(f).__hash__
<slot wrapper '__hash__' of 'object' objects>

the of 'object' objects part means it just inherits the default identity-based __hash__ from object. Function == and hash work by identity. The difference between id and hash is normal for any type that inherits object.__hash__:

>>> x = object()
>>> id(x)
40145072L
>>> hash(x)
2509067

You might think __hash__ is only supposed to be defined for immutable objects, and you'd be almost right, but that's missing a key detail. __hash__ should only be defined for objects where everything involved in == comparisons is immutable. For objects whose == is based on identity, it's completely standard to base hash on identity as well, since even if the objects are mutable, they can't possibly be mutable in a way that would change their identity. Files, modules, and other mutable objects with identity-based == all behave this way.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • So by default, all instances in Python are hashable unless it is turned off somewhere in the class hierarchy? That is interesting news indeed, although intuitively it makes sense based on how few types you *cant't* use to key a `dict`. – Mad Physicist Jul 22 '16 at 13:28
  • `==` does not work on identity. `is` is the identity operator. – Delioth Jul 22 '16 at 14:48
  • 4
    @Delioth: `==` works by identity for any type that doesn't specify some other behavior. – user2357112 Jul 22 '16 at 14:58
  • 1
    @MadPhysicist The proper way to tell python that a class is not hashable is by setting `__hash__` to `None`: `class X: __hash__ = None;{X(): 1} -> TypeError: unhashable type: 'X'` – Bakuriu Jul 22 '16 at 18:34
  • @Bakuriu. Normally I would agree with you. Have you tried doing this for a function? See the comments below http://stackoverflow.com/a/38518991/2988730. – Mad Physicist Jul 22 '16 at 18:44
  • @MadPhysicist I don't understand what you are referring to. My comment is about **defining a class**. That's different than setting an instance attribute on any object. For example: `class X:pass; x=X(); x.__hash__=None; {x:1}` is fine. There's *nothing* special about functions, except that you **cannot** do `type(function).__hash__ = None` because it's a built-in type and only C code can modify that method *on the type*. – Bakuriu Jul 22 '16 at 18:47
  • @Bakuriu I had not gotten to the other comment when I responded to this one. My mistake. I agree with everything you said, but we are starting to go out of scope of the original question. The confusion came from the fact that the function class's hash is read-only in Python. – Mad Physicist Jul 22 '16 at 18:55
26

It can be useful, e.g., to create sets of function objects, or to index a dict by functions. Immutable objects normally support __hash__. In any case, there's no internal difference between a function defined by a def or by a lambda - that's purely syntactic.

The algorithm used depends on the version of Python. It looks like you're using a recent version of Python on a 64-bit box. In that case, the hash of a function object is the right rotation of its id() by 4 bits, with the result viewed as a signed 64-bit integer. The right shift is done because object addresses (id() results) are typically aligned so that their last 3 or 4 bits are always 0, and that's a mildly annoying property for a hash function.

In your specific example,

>>> i = 140643045241584 # your id() result
>>> (i >> 4) | ((i << 60) & 0xffffffffffffffff) # rotate right 4 bits
8790190327599  # == your hash() result
Tim Peters
  • 67,464
  • 13
  • 126
  • 132
  • 2
    @JulienBernu, see @user2357112's response - it's immutable so far as `==` is concerned, which is all `__hash__()` cares about. – Tim Peters Jul 22 '16 at 06:08
4

A function is hashable because it is a normal, builtin, non mutable object.

From the Python Manual:

An object is hashable if it has a hash value which never changes during its lifetime (it needs a __hash__() method), and can be compared to other objects (it needs an __eq__() or __cmp__() method). Hashable objects which compare equal must have the same hash value.

Hashability makes an object usable as a dictionary key and a set member, because these data structures use the hash value internally.

All of Python’s immutable built-in objects are hashable, while no mutable containers (such as lists or dictionaries) are. Objects which are instances of user-defined classes are hashable by default; they all compare unequal (except with themselves), and their hash value is derived from their id().

Community
  • 1
  • 1
Julien
  • 13,986
  • 5
  • 29
  • 53
  • 16
    They're actually a lot more mutable than you might expect. You can add attributes to them, reassign their `__name__`, `__module__`, and `__doc__`, and even reassign their `__code__` if you're feeling crazy. – user2357112 Jul 22 '16 at 05:47
  • 5
    You can also reassign their `__hash__` function :) – Mad Physicist Jul 22 '16 at 13:27
  • 4
    @MadPhysicist: Though reassigning `f.__hash__` won't actually do anything to `hash(f)`. – user2357112 Jul 22 '16 at 16:20
  • @user2357112 Interesting. I just confirmed. Why is that? – Mad Physicist Jul 22 '16 at 16:22
  • @MadPhysicist: http://stackoverflow.com/questions/38133096/assigning-instead-of-defining-a-getitem-magic-method-breaks-indexing – user2357112 Jul 22 '16 at 16:26
  • @user2357112. I see. And `type(p).__hash__ = lambda s: 1` conveniently fails with `TypeError: can't set attributes of built-in/extension type 'function'`. The only way is to override the class somehow. – Mad Physicist Jul 22 '16 at 16:34
  • 1
    @MadPhysicist You just can't. The Python interpreter *does* support **true** readonly attributes, but these facilities can only be accessed via the C API. So from pure python code it's impossible to change those values (thank god!). – Bakuriu Jul 22 '16 at 18:38