-4

I'm looking for a simple hash function that yields no collisions.

I have an alphanumeric sequence (say 16 letter long) and I want each one of them to map to unique hashed value. Ideally, the hashed value is the same length as the original sequence (16 letter long).

Is there a simple python hash function that achieves this?

John Kugelman
  • 349,597
  • 67
  • 533
  • 578
leopoodle
  • 2,110
  • 7
  • 24
  • 36
  • You already named it: it's called a "perfect hash function", first described in a paper around 1978, I think. How did you not find that in your literature search? – Prune Apr 03 '17 at 22:32
  • 1
    Could it be a simple cipher? i.e. Subtract 5 from the ascii value for each character (and loop to make sure all characters are valid) – nbryans Apr 03 '17 at 22:32
  • alphanumeric sequences can be stored as strings... which are immutable and therefore hashable. Just use the `str(sequence)`: a `dict = { str(sequence): sequence, ...}` or just `set(str(sequence), ...)` – TemporalWolf Apr 03 '17 at 22:37
  • 2
    Sounds like the identity hash would work just fine. If you want to hash 16-character sequences to 16-character sequences with no collisions, `lambda x: x` satisfies those requirements. – user2357112 Apr 03 '17 at 22:43
  • note: It must return a different value. Cannot always return X if the input to the hash function is X – leopoodle Apr 03 '17 at 22:58
  • @SteveHe what about `s[::-1]`? `s_new = "".join([chr(ord(char) + 1) for char in s])`? That is a different value. If that doesn't work, what consitutes "different enough"? – TemporalWolf Apr 03 '17 at 23:01
  • 1
    I'm voting to close this question because the requirements aren't defined and the questioner pulled out new requirements after otherwise-valid answers were given. – user2357112 Apr 03 '17 at 23:09
  • 1
    "Cannot always return X if the input to the hash function is X." Why not? You need to explain why. If this is a security measure, it's incredibly dubious. Why do you need a perfect hash, versus say SHA256? – John Kugelman Apr 03 '17 at 23:25

3 Answers3

3

Use something like

u"".join([unichr(ord(spl[0])*100 + ord(spl[1])-30) for spl in [instr[i: i + 2] for i in range(0, 16, 2)]])

which is a really crummy shift:

1234567890123456 becomes ጸᐂᓌᖖᙖጸᐂᓌ

aAzZ09jdmekfADEF becomes ☇⿤ዛ⦮⫛⨔ᦊᬜ

IamBADatREQUIREM becomes ᳇⪸ᦊ☺ ΊᲸᬣ

zzaa009990123456 becomes 〄☧ዒᙟᙖጸᐂᓌ

which is reversed via:

"".join([chr(ord(num) / 100) + chr(ord(num) % 100 + 30) for num in unistring])

u"〄☧ዒᙟᙖጸᐂᓌ" becomes zzaa009990123456

and thus the circle of life is complete.

TemporalWolf
  • 7,727
  • 1
  • 30
  • 50
0

If the hash function can return any immutable object, return the strings directly. No need to map them into anything.

def hash(s):
    return s

If it needs to return ints, then it's simple. Python integers can be arbitrarily large, so all you have to do is convert the string an int with the same bytes.

def hash(s):
    return int.from_bytes(string.encode(), byteorder='big')
John Kugelman
  • 349,597
  • 67
  • 533
  • 578
  • the first one is not acceptable. we cannot always return the original string. it is one of my requirements. let's say for security reason we want to map to some other value – leopoodle Apr 03 '17 at 22:57
0

hash functions return a string of fixed length relative to the hash function not the input string, what you want is a cipher

https://crypto.stackexchange.com/questions/8765/is-there-a-hash-function-which-has-no-collisions

ASCII value of a character in Python

from random import shuffle

def create_cipher():

    key = []
    # ASCII value of numerals
    for i in range(48,58):
        key.append(i)
    # ASCII value of lowercase
    for i in range(65,91):
        key.append(i)
    # ASCII value of uppercase
    for i in range(97,123):
        key.append(i)
    print key
    cipher = list(key)
    shuffle(cipher)
    print cipher
    cipher_dict = {}
    for i in range(len(key)):
        cipher_dict[key[i]] = cipher[i]
    return cipher_dict

def cipher(string,code_dict):

    coded_list = []
    list_string = list(string)
    list_string = [ord(i) for i in list_string]
    for i in list_string:
        coded_list.append(code_dict[i])
    coded_list = [chr(i) for i in coded_list]
    return  ''.join(coded_list)

msg = 'HelloWorld2017'
encode_dict = create_cipher()
decode_dict = {y:x for x,y in encode_dict.iteritems()}
encoded_cipher = cipher(msg,encode_dict)
decoded_cipher = cipher(encoded_cipher,decode_dict)

print msg
print encoded_cipher
print decoded_cipher


>>>
HelloWorld2017
pryyLILoyx9fgm
HelloWorld2017

of course you'd want to validate your inputs... for alphanumeric you can use:

isalnum()
Community
  • 1
  • 1
litepresence
  • 3,109
  • 1
  • 27
  • 35