Is there anyway that I can hash a random string into a 8 digit number without implementing any algorithms myself?

- 10,591
- 9
- 64
- 104

- 6,963
- 10
- 39
- 72
-
6hash("your string") % 100000000 – Theran Apr 15 '13 at 06:17
-
38 digit seems to small, and may result in collisions of hashes if you have large number of records. http://stackoverflow.com/questions/1303021/shortest-hash-in-python-to-name-cache-files – DhruvPathak Apr 15 '13 at 06:19
-
1Use hashlib since hash has another purpose! – architectonic Jan 31 '17 at 16:32
-
3Any finite number of digits will result in collisions for sufficiently large numbers of hash items, that's why you shouldn't treat them as unique keys - it tends to turn into the birthday problem. – Alex North-Keys May 17 '17 at 22:27
-
1I've chosen "CityHash" to hash strings to 19 digit long integers (64bit integers), hoping this will lead to less potential collisions than Raymond's suggestion below. https://en.wikipedia.org/wiki/List_of_hash_functions – tryptofame Jul 21 '17 at 13:05
4 Answers
Yes, you can use the built-in hashlib
module or the built-in hash
function. Then, chop-off the last eight digits using modulo operations or string slicing operations on the integer form of the hash:
>>> s = 'she sells sea shells by the sea shore'
>>> # Use hashlib
>>> import hashlib
>>> int(hashlib.sha1(s.encode("utf-8")).hexdigest(), 16) % (10 ** 8)
58097614L
>>> # Use hash()
>>> abs(hash(s)) % (10 ** 8)
82148974

- 14,854
- 11
- 100
- 103

- 216,523
- 63
- 388
- 485
-
54public service announcement...this technique doesn't actually result in a unique hash value for the string; it computes a hash and then munges into a non-guaranteed-unique value – twneale Sep 18 '15 at 15:03
-
170public service announcement...except for the special case of perfect hashes over limited set of input values, hash functions aren't supposed to generate guaranteed unique values. – Raymond Hettinger Sep 19 '15 at 15:39
-
4Probably true, but virtually all of their practical utility derives from their their good-enough tendency to produce unique values. The probability of a 'hash' collision using this trick is probably 10 or 11 orders of magnitude higher than md5 – twneale Sep 20 '15 at 23:04
-
10Did you read the OP's question? He (or she) wanted (or needed) 8 decimal places. Also, the way hash tables work is to hash into a small search space (the sparse table). You seem to not know want hash functions are commonly used for and to not care about the actual question that was asked. – Raymond Hettinger Sep 21 '15 at 03:19
-
24I read the question. I'm simply observing that over the same input space as SHA-1, your answer is astronomically more likely to produce a collision than not. At least some degree of uniqueness is implicitly required by the question, but your answer is a hash function in the same spirit as one that simply returns 12345678 for every input. I was able to experimentally generate a collision with as few as 1000 inputs using this method. To preserve the same collision probability as SHA-1, you would have to map un-truncated SHA-1's to 8-digit integers. I think that's worthy of a PSA – twneale Sep 21 '15 at 15:58
-
38Careful, hash(s) is not guarateed to give same results across platforms and runs. – Mr. Napik Feb 16 '16 at 21:33
-
4
-
-
Right, but what I think Doug meant is that even a negative number mod something will always produce a positive number, so it seems you can drop the abs(). Also, I think the relative operator precedence of exponentiation means we don't even need the second parens. Thanks for the answer, though! >>> hash(s) % 10**8 produces 57227199 – JJC Feb 07 '17 at 10:41
-
4An important caveat is that, unlike with Python 2.x, hash(x) returns a different value on each Python 3.x interpreter invocation (it is consistent within a single process). So, if the OP is depending on the hash to be the same for a given string across script runs, the latter will not work in Python 3.x. This just bit me. I will add an answer to reflect these two comments (not yet sure about etiquette of editing). – JJC Feb 07 '17 at 11:37
-
Should use `1e8` instead of `10**8` you're performing an extra computation when there is absolutely no need. Also, nice answer, it's exactly what I was looking for. – silgon Nov 02 '18 at 16:24
-
5@silgon Python's peephole optimizer does constant folding, so the computation is only done once. That is easy to verify. Run ``dis(compile('10 ** 8', '', 'eval'))`` and look for the fragment ``LOAD_CONST 0 (100000000)``. Alternatively, run ``def f(): return 10**8`` and observe that ``f.__code__.co_consts`` returns ``(None, 100000000)``. Notes that ``10E8`` isn't a valid substitute because that is a *float* rather than an *int*. – Raymond Hettinger Nov 02 '18 at 22:05
-
1Wow... I just checked what you said, you're right, and it's really interesting, I thought that the power operation `**` would always run an operation, however as you said, it's `LOAD_CONST`. Thanks for the interesting reply. Also, you're right, the scientific notation `1e8` gives a float. – silgon Nov 03 '18 at 09:24
-
Some of the comments regarding 'unique value' are confusing. I am trying to do same thing, and tested in Python 3.7.4 and 3.5.3 on two different machines. For the same input string, the result are the same. Is it true that the same input string always results in the same output for `hashlib.sha1` ? – user1783732 Aug 10 '19 at 00:04
-
If your extracted 8 digits start with a 0, you'll end up with a 7 digit number. – ingo Apr 02 '20 at 08:55
-
1I think it is worth mentioning that if you want a stable hash you you should use the `hashlib` solution. – kaptan Sep 24 '21 at 21:20
Raymond's answer is great for python2 (though, you don't need the abs() nor the parens around 10 ** 8). However, for python3, there are important caveats. First, you'll need to make sure you are passing an encoded string. These days, in most circumstances, it's probably also better to shy away from sha-1 and use something like sha-256, instead. So, the hashlib approach would be:
>>> import hashlib
>>> s = 'your string'
>>> int(hashlib.sha256(s.encode('utf-8')).hexdigest(), 16) % 10**8
80262417
If you want to use the hash() function instead, the important caveat is that, unlike in Python 2.x, in Python 3.x, the result of hash() will only be consistent within a process, not across python invocations. See here:
$ python -V
Python 2.7.5
$ python -c 'print(hash("foo"))'
-4177197833195190597
$ python -c 'print(hash("foo"))'
-4177197833195190597
$ python3 -V
Python 3.4.2
$ python3 -c 'print(hash("foo"))'
5790391865899772265
$ python3 -c 'print(hash("foo"))'
-8152690834165248934
This means the hash()-based solution suggested, which can be shortened to just:
hash(s) % 10**8
will only return the same value within a given script run:
#Python 2:
$ python2 -c 's="your string"; print(hash(s) % 10**8)'
52304543
$ python2 -c 's="your string"; print(hash(s) % 10**8)'
52304543
#Python 3:
$ python3 -c 's="your string"; print(hash(s) % 10**8)'
12954124
$ python3 -c 's="your string"; print(hash(s) % 10**8)'
32065451
So, depending on if this matters in your application (it did in mine), you'll probably want to stick to the hashlib-based approach.

- 9,547
- 8
- 48
- 53
-
5It should be noted that this answer has a very important caveat since Python 3.3, to protect against tar-pitting Python 3.3 and above use a random hash seed upon startup. – Wolph Jan 06 '18 at 13:20
-
If digits are not your main requirement you could also use `hashlib.sha256("hello world".encode('utf-8')).hexdigest()[:8]` witch still will have collisions – lony Dec 17 '18 at 16:41
-
Just to complete JJC answer, in python 3.5.3 the behavior is correct if you use hashlib this way:
$ python3 -c '
import hashlib
hash_object = hashlib.sha256(b"Caroline")
hex_dig = hash_object.hexdigest()
print(hex_dig)
'
739061d73d65dcdeb755aa28da4fea16a02b9c99b4c2735f2ebfa016f3e7fded
$ python3 -c '
import hashlib
hash_object = hashlib.sha256(b"Caroline")
hex_dig = hash_object.hexdigest()
print(hex_dig)
'
739061d73d65dcdeb755aa28da4fea16a02b9c99b4c2735f2ebfa016f3e7fded
$ python3 -V
Python 3.5.3

- 111
- 1
- 2
As of Python 3.10 another quick way of hashing string to an 8 hexadecimal digit digest is to use shake.hexdigest(4) :
import hashlib
h=hashlib.shake_128(b"my ascii string").hexdigest(4)
#34c0150b
Mind the 4 instead of 8 because the digest is twice as long as the number given as parameter.
Of course be aware of hash collisions.

- 141
- 2
- 9