2

Right now I'm using x.tostring() but I'm looking for something faster.

J. Smith
  • 143
  • 1
  • 2
  • 11
  • 1
    If there are no duplicates in the set of your arrays, you can use `id(x)`. – hilberts_drinking_problem Dec 04 '16 at 00:30
  • @YakymPirozhenko Unfortunately there are duplicates. – J. Smith Dec 04 '16 at 00:34
  • If you have large arrays and there are not too many of them, you may be better off with `np.searchsorted` instead of hashing, despite the log(N) overhead. – hilberts_drinking_problem Dec 04 '16 at 00:42
  • I assume your array is numeric. Perhaps you can calculate the sum of all elements and use it as a hash. If two sums are different, then the arrays are different, too. If the sums are equal, you can compare `.tostring()`s. – DYZ Dec 04 '16 at 00:55
  • `tostring` is just a bytestring copy of the data buffer. I.e. the whole array without the shape, strides attributes. So fetching it is fast, but as a hash it might be way too big. But it may be the only way to distinguish two arrays that differ by only one byte. It won't distinguish views. – hpaulj Dec 04 '16 at 02:14
  • See https://stackoverflow.com/questions/16589791/most-efficient-property-to-hash-for-numpy-array. – user76284 Dec 02 '19 at 22:42

0 Answers0