8

One way to make a numpy array hashable is setting it to read-only. This has worked for me in the past. But when I use such a numpy array in a tuple, the whole tuple is no longer hashable, which I do not understand. Here is the sample code I put together to illustrate the problem:

import numpy as np

npArray = np.ones((1,1))
npArray.flags.writeable = False
print(npArray.flags.writeable)

keySet = (0, npArray)
print(keySet[1].flags.writeable)

myDict = {keySet : 1}

First I create a simple numpy array and set it to read-only. Then I add it to a tuple and check if it is still read-only (which it is).

When I want to use the tuple as key in a dictionary, I get the error TypeError: unhashable type: 'numpy.ndarray'.

Here is the output of my sample code:

False
False
Traceback (most recent call last):
  File "test.py", line 10, in <module>
    myDict = {keySet : 1}
TypeError: unhashable type: 'numpy.ndarray'

What can I do to make my tuple hashable and why does Python show this behavior in the first place?

Demento
  • 4,039
  • 3
  • 26
  • 36
  • 1
    Where did you get the idea that setting the `writeable` flag to `False` would make the array hashable? That doesn't work, even before you bring the tuple into the picture. – user2357112 Oct 26 '17 at 19:12
  • A numpy array does not have the required hashing method (something like `__hash__`). – hpaulj Oct 26 '17 at 19:16
  • The method is described here: https://stackoverflow.com/questions/16589791/most-efficient-property-to-hash-for-numpy-array – Demento Oct 26 '17 at 19:22

2 Answers2

15

You claim that

One way to make a numpy array hashable is setting it to read-only

but that's not actually true. Setting an array to read-only just makes it read-only. It doesn't make the array hashable, for multiple reasons.

The first reason is that an array with the writeable flag set to False is still mutable. First, you can always set writeable=True again and resume writing to it, or do more exotic things like reassign its shape even while writeable is False. Second, even without touching the array itself, you could mutate its data through another view that has writeable=True.

>>> x = numpy.arange(5)
>>> y = x[:]
>>> x.flags.writeable = False
>>> x
array([0, 1, 2, 3, 4])
>>> y[0] = 5
>>> x
array([5, 1, 2, 3, 4])

Second, for hashability to be meaningful, objects must first be equatable - == must return a boolean, and must be an equivalence relation. NumPy arrays don't do that. The purpose of hash values is to quickly locate equal objects, but when your objects don't even have a built-in notion of equality, there's not much point to providing hashes.


You're not going to get hashable tuples with arrays inside. You're not even going to get hashable arrays. The closest you can get is to put some other representation of the array's data in the tuple.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • 2
    +1, thanks for the clarification on the internals. I'll have to go with converting it to a string first, if the idea of hashing the array is off the table. – Demento Oct 26 '17 at 19:27
  • Can one create some sort of wrapper around numpy arrays that makes them hashable and equatable? So that one doesn't have to keep going back and forth between byte/string/etc. representations of arrays? – user76284 Jun 07 '20 at 00:51
9

The fastest way to hash a numpy array is likely tostring.

In [11]: %timeit hash(y.tostring())

What you could do is rather than use a tuple define a class:

class KeySet(object):
    def __init__(self, i, arr):
        self.i = i
        self.arr = arr
    def __hash__(self):
        return hash((self.i, hash(self.arr.tostring())))

Now you can use it in a dict:

In [21]: ks = KeySet(0, npArray)

In [22]: myDict = {ks: 1}

In [23]: myDict[ks]
Out[23]: 1
Andy Hayden
  • 359,921
  • 101
  • 625
  • 535
  • 2
    This feels pythonic, thx for the fast answer! And I can also convert it back with a function wrapping numpy.fromstring(), which I will require later on. – Demento Oct 26 '17 at 19:24