-1

I'm trying to find the Go equivalent to python's hash function:

hash("test")

I've found this post which is a very similar function in the sense that it returns an integer, however, it uses fnv which appears to be a different hashing method to the python version

What I'm trying to do is pass a string to the hash function whereby it returns exactly the same integer in both languages for the same string.

Jonathan Hall
  • 75,165
  • 16
  • 143
  • 189
jm21356
  • 87
  • 6
  • 3
    I don't know the answer, but I'm curious about why you would need this – roganjosh Oct 08 '19 at 20:43
  • 3
    How it works is in internal implementation detail, not to mention objects can have a `__hash__()` method which means it could return anything. – JimB Oct 08 '19 at 20:45
  • 1
    Looks like an [XY Problem](http://xyproblem.info/). What is your actual goal? – Jonathan Hall Oct 08 '19 at 21:34
  • Thanks for the replies. I'm trying to determine which shard a string should be stored on. So it's just to evenly distribute data across a series of MySQL databases. I'm going for `shard = md5(“1.2.3.4") % 4096`. It's from this section of a [pinterest blog](https://medium.com/pinterest-engineering/sharding-pinterest-how-we-scaled-our-mysql-fleet-3f341e96ca6f) (please see the bottom of the post). – jm21356 Oct 08 '19 at 21:43
  • 1
    In that case, the Python hash function is _the wrong solution_, since it will change for each invocation. You need a stable hashing function. md5 is fine, but there are no doubt faster alternatives. CRC16 or CRC32 is probably fine, unless you're hashing input that may be user-manipulated (in which case MD5 isn't safe, either) – Jonathan Hall Oct 09 '19 at 07:51
  • Ok great, many thanks for your help guys, I'll go for the crc32 method. – jm21356 Oct 10 '19 at 20:08

1 Answers1

4

By default, the __hash__() values of str, bytes and datetime objects are “salted” with an unpredictable random value. Although they remain constant within an individual Python process, they are not predictable between repeated invocations of Python.

You will get different numbers between different invocations of the Python script. So I don't think what you want is even possible.

Source: https://docs.python.org/3.5/reference/datamodel.html#object.__hash__

Code-Apprentice
  • 81,660
  • 23
  • 145
  • 268
  • I _think_ you can stop the randomisation, no? – roganjosh Oct 08 '19 at 20:45
  • @AdamSmith which would be relevant in a general application but internally it would seem strange to me that you couldn't either a) fix the seed or b) at least grasp the seed and transmit it if you're _sure_ that such a situation wouldn't arise – roganjosh Oct 08 '19 at 20:47
  • 1
    @roganjosh You can set `PYTHONHASHSEED` (either as an environment variable or from the command line) to get a reproducible random sequence, but I haven't yet found a way to disable salting all together. – Code-Apprentice Oct 08 '19 at 20:47
  • @AdamSmith I luckily stumbled on that as I clicked links and scanned the docs. – Code-Apprentice Oct 08 '19 at 20:50
  • [this](https://stackoverflow.com/a/27522708/4799172) shows how to disable it: "... but you can set it to a fixed positive integer value, with 0 disabling the feature altogether." – roganjosh Oct 08 '19 at 21:09
  • @roganjosh So in general, what the OP asks isn't possible because even Python itself won't replicate the same hash for a given string between invocations of the same script. – Code-Apprentice Oct 08 '19 at 21:19