Python - What is the fastest way to make a numpy.ndarray hashable?

Asked Dec 04 '16 at 00:20

Active Dec 04 '16 at 00:28

Viewed 975 times

Right now I'm using x.tostring() but I'm looking for something faster.

asked Dec 04 '16 at 00:20

J. Smith

1

If there are no duplicates in the set of your arrays, you can use `id(x)`. – hilberts_drinking_problem Dec 04 '16 at 00:30
@YakymPirozhenko Unfortunately there are duplicates. – J. Smith Dec 04 '16 at 00:34
If you have large arrays and there are not too many of them, you may be better off with `np.searchsorted` instead of hashing, despite the log(N) overhead. – hilberts_drinking_problem Dec 04 '16 at 00:42
I assume your array is numeric. Perhaps you can calculate the sum of all elements and use it as a hash. If two sums are different, then the arrays are different, too. If the sums are equal, you can compare `.tostring()`s. – DYZ Dec 04 '16 at 00:55
`tostring` is just a bytestring copy of the data buffer. I.e. the whole array without the shape, strides attributes. So fetching it is fast, but as a hash it might be way too big. But it may be the only way to distinguish two arrays that differ by only one byte. It won't distinguish views. – hpaulj Dec 04 '16 at 02:14
See https://stackoverflow.com/questions/16589791/most-efficient-property-to-hash-for-numpy-array. – user76284 Dec 02 '19 at 22:42

0 Answers0