0

I have this function hash() that encrypts a given string to an integer.

letters = 'weiojpknasdjhsuert'

def hash(string_input):
  h = 3

  for i in range(0, len(string_input)):
    h = h * 43 + letters.index(string_input[i])

  return h

So if I do print(hash('post')), my output is: 11231443

How can I find what my input needs to be to get an output like 1509979332193868 if the input can only be a string from letters? There is a formula inside the loop body but I couldn't figure out how to reverse it.

Onur-Andros Ozbek
  • 2,998
  • 2
  • 29
  • 78

1 Answers1

2

It seem like since 43 is larger than your alphabet, you can just reverse the math. I don't know how to prove there are no hash collisions, so this may have edge cases. For example:

letters = 'weiojpknasdjhsuert'

def hash(string_input):
    h = 3

    for i in range(0, len(string_input)):
        h = h * 43 + letters.index(string_input[i])

    return h

n = hash('wepostjapansand')
print(n)
# 9533132150649729383107184

def rev(n):
    s = ''
    while n:
        l = n % 43  # going in reverse this is the index of the next letter
        n -= l      # now reverse the math...subtract that index
        n //= 43    # and divide by 43 to get the next n
        if n:
            s = letters[l] + s
    return s

print(rev(n))
# wepostjapansand

With a more reasonable alphabet, like lowercase ascii and a space, this still seem to be okay:

from string import ascii_lowercase 

letters = ascii_lowercase + ' '

def hash(string_input):
    h = 3

    for i in range(0, len(string_input)):
        h = h * 43 + letters.index(string_input[i])

    return h

n = hash('this is some really long text to test how well this works')
print(n)
# 4415562436659212948343839746913950248717359454765313646276852138403823934037244869651587521298

def rev(n):
    s = ''
    # with more compact logic
    while n > 3:
        s = letters[n % 43] + s
        n = (n - (n % 43)) // 43
    return s

print(rev(n))
# this is some really long text to test how well this works

The basic idea is that after all the math, the last number is:

prev * 43 + letter_index

This means you can recover the final letter index by taking the prev modulus 43. Then subtract that and divide by 43 (which is just the reverse of the math) and do it again until your number is zero.

Mark
  • 90,562
  • 7
  • 108
  • 148
  • I didn't notice that `alphabet` was so small, I assumed it would include all uppercase and lowercase letters, so at least 52 characters, which is larger than 43. – Barmar Oct 02 '21 at 06:55
  • Your first `def rev(n)` actually works. Would you mind explaining your logic? – Onur-Andros Ozbek Oct 02 '21 at 06:59
  • Yeah, @Barmar, I think that's right. With all the upper and lower case letters, this seems to work with a bigger multiple like 59 – Mark Oct 02 '21 at 07:04
  • @oo92, I added some explanation at the end...hope that's helpful. – Mark Oct 02 '21 at 07:04
  • I'm curious: Why doesn't the initial value `h = 3` appear in the reverse? – Barmar Oct 02 '21 at 07:06
  • @Barmar it gets eaten by the last integer division since 3 is smaller than 43 (which is also why that `if n` test is there otherwise it would print an extra letter). I suppose `while n > 3:` would also be an option that avoids this. – Mark Oct 02 '21 at 07:08