3

I have to turn this formula into java code:

It would be much easier if I could use libraries such as Math.BigInteger but unfortunately I should do this without it. Some similar question on stackoverflow suggested to write an own bignum library but I want to do it without it.

Right now, my progress is at this step:

int h(String s) {
  long value = 1;
  int mod = ht.length;

  for (int i=0; i < s.length()-1; i++) {
     h += s.charAt(i) * Math.pow(256,i);
  }
  return (int) h % mod;
}

I know that the values of the power gets pretty quick out of the integer range so I thought about writing an own method which calculates the power and modulo of the value. My mathematical knowledge just isn't good enough to know when to use the modulo and how to easy things up.

Thanks in advance!

Community
  • 1
  • 1
Leo
  • 111
  • 1
  • 1
  • 8
  • hashvalues of a String – Leo Jan 02 '17 at 14:11
  • 2
    `Math.BigInteger` isn't an *external* library. It's part of the JDK, like `String` and `Math.pow`. – T.J. Crowder Jan 02 '17 at 14:12
  • There's multiple ways to hash a string, does this one have a name? Where did you get the pic from? – weston Jan 02 '17 at 14:12
  • @weston It's part of a project we have to do at university. – Leo Jan 02 '17 at 14:14
  • So, no name? Universities are usually good at attributing credit, like naming the inventor – weston Jan 02 '17 at 14:15
  • @T.J.Crowder yes I know but I am not allowed to import any additional libraries and I don't know how to create a new BigInteger without having the library imported – Leo Jan 02 '17 at 14:17
  • I doubt this implementation would be best done with BigInteger anyway – weston Jan 02 '17 at 14:19
  • @weston I think it has no name, it's just part of the task where we have to create a small messenger service and now I have to get the hashvalue of a string by using that formula. Nevertheless discussing about its origin won't help me right now – Leo Jan 02 '17 at 14:19
  • I asked its name for two reasons. Mainly, if you know its name, you might find an implementation **and that would help you right now**. Second, if someone else is looking for this later by name, they can find it. – weston Jan 02 '17 at 14:32
  • 3
    @Leo: Again: `BigInteger` **isn't** an additional library. It's part of the JDK. If you have `Math.pow`, you have `java.math.BigInteger`. It's fine if you don't want to use it for some reason, but again, it's not additional, it's not external, etc. It's part of the basic Java set of classes. – T.J. Crowder Jan 02 '17 at 14:33
  • What is `ht.length`? that is your `n`, so where does that come from? – weston Jan 02 '17 at 14:52
  • `ht.length` is the length of an array (our task says so) – Leo Jan 02 '17 at 15:05
  • What array is that? You've only told us about a string – weston Jan 02 '17 at 15:15
  • 1
    Possible duplicate of [How to calculate modulus of large numbers?](http://stackoverflow.com/questions/2177781/how-to-calculate-modulus-of-large-numbers) – phuclv Jan 02 '17 at 15:24
  • [Calculating pow(a,b) mod n](http://stackoverflow.com/q/8496182/995714), [Calculating (a^b)%MOD](http://stackoverflow.com/q/11272437/995714) – phuclv Jan 02 '17 at 15:25
  • @LưuVĩnhPhúc no, you don't need any powers here (not explicitly - that's just a convenient way to write it in math), so that's a bad dupe – harold Jan 02 '17 at 15:26

4 Answers4

5

If you go from the back, you don't have to take any powers at all. Simply multiplying by 256 at each step will have the same effect (values from the back "accumulate" more multiplications, raising them to the required powers). Eg (not tested)

int h(String s) {
  int res = 0;
  int n = ht.length;

  for (int i = s.length() - 1; i >= 0; i--) {
     // using a long here to prevent premature wrapping
     long t = res * 256L + s.charAt(i);
     res = (int)(t % n);
  }
  return (res + 1) % n;
}

Also note that ht.length should not be a power of two (so you can't skip the modulo reduction in the loop, which you could if ht.length was a power of two), because if it's a power of two then the hash depends on (at most) the first 4 characters, which is obviously bad.

harold
  • 61,398
  • 6
  • 86
  • 164
3

I've chosen a large prime for your n by default, ask your tutor, but there's no way it's a good idea to use any non-prime. If that's the number of buckets in a hash table, make sure that number is a prime. Also you mustn't -1 in your for loop's exit condition, you're missing the last character.

private static int MAX_PRIME = 2147483647; //largest positive 32 signed int prime (also happens to be the largest positive 32 signed int)

public static int hash(String s) {
    return hash(s, MAX_PRIME);
}

public static int hash(String s, int primeN) {
    long h = 1;
    long m = 1;

    for (int i = 0; i < s.length(); i++) {
        h += s.charAt(i) * m;
        h %= primeN;
        m *= 256;
        m %= primeN;
    }

    return (int) h;
}

If you want to test the correctness, then you can compare the generated hashes to a BigInteger implementation:

public static int hashBigInt(String s) {
    return hashBigInt(s, MAX_PRIME);
}

public static int hashBigInt(String s, int primeN) {
    final BigInteger bi256 = BigInteger.valueOf(256);
    BigInteger h = BigInteger.ONE;
    BigInteger m = BigInteger.ONE;

    for (int i = 0; i < s.length(); i++) {
        h = h.add(BigInteger.valueOf(s.charAt(i)).multiply(m));
        m = m.multiply(bi256);
    }

    return h.mod(BigInteger.valueOf(primeN))
            .intValue();
}
weston
  • 54,145
  • 21
  • 145
  • 203
  • That "array" seems to be the hashtable (`ht`). If the length of `ht` is a prime number it could be a fine choice for n. I think if you choose `MAX_PRIME` = 2^32 you could simply let integer overflow do the modulo operation. – Calculator Jan 02 '17 at 15:18
  • @Calculator if `n` is not a prime, e.g. 2^32, characters beyond the fourth will have no effect on the final hash – weston Jan 02 '17 at 15:30
  • ok I think there's still a problem with the calculation of `m` in the first function though, since now the `*= 256` step might wrap before the modulo is applied (if `MAX_PRIME` > 2^24) – harold Jan 02 '17 at 15:30
  • Thanks @harold, notice I upped them to `long`s for this reason – weston Jan 02 '17 at 15:31
  • Oh sorry, I missed the `long` - no problem then – harold Jan 02 '17 at 15:31
  • @Calculator "If the length of `ht` is a prime number it could be a fine choice for `n`." I agree with this. – weston Jan 02 '17 at 15:38
  • I missed that the trick using the overflow won't work with 256. It would require a prime number like 31, which java is using for hashing, to work. – Calculator Jan 02 '17 at 15:50
  • @Calculator yes, that's right java has something like `31^i` instead of `256^i`, meaning their `n` does not need to be a prime – weston Jan 02 '17 at 15:52
2

You should basically move the modulo deeper into the equation to keep the values low at every step. For that you can basically make use of the module rules:

  • (a + b) % n = (a % n + b % n) % n
  • (a * b) % n = (a % n * b % n) % n

First move it into the loop:

h = (h + s.charAt(i) * Math.pow(256, i)) % mod;

Then move it into the pow as well:

h = (h + s.charAt(i) * Math.pow(256 % mod, i)) % mod;

Finally I would stop using pow and so some custom powering where you mod after each step like
((((256 % mod) * 256 % mod) * 256 % mod) ... )

luk2302
  • 55,258
  • 23
  • 97
  • 137
  • Thank you for your answer, this also came across my mind. I'm just not sure when to mod because the formula wants it after summing up all powers. Do I have to mod the `s.charAt(i)` as well? Sometimes in Maths I think I learned that `(a*b) mod n = ((a mod n) * (b mod n)) mod n` – Leo Jan 02 '17 at 14:25
  • 1
    @Leo I *just* included 2 sentences regarding the rules – luk2302 Jan 02 '17 at 14:25
1

I think you are looking at Fast modular exponentiation:

Let's consider this simple formula to explain how it works:

x = A^B % C as x = 5^117 % 19

1. Decomposing B into power of two

117 = (2^0 + 2^2 + 2^4 + 2^5 + 2^6) 117 = (1 + 4 + 16 + 32 + 64 )

2. Compute mod C for power of two <= B

5^2 mod 19 = (5^1 * 5^1) mod 19 = (5^1 mod 19 * 5^1 mod 19) mod 19
5^2 mod 19 = (5 * 5) mod 19 = 25 mod 19
5^2 mod 19 = 6

5^4 mod 19 = (5^2 * 5^2) mod 19 = (5^2 mod 19 * 5^2 mod 19) mod 19
5^4 mod 19 = (6 * 6) mod 19 = 36 mod 19
5^4 mod 19 = 17

5^8 mod 19 = (5^4 * 5^4) mod 19 = (5^4 mod 19 * 5^4 mod 19) mod 19
5^8 mod 19 = (17 * 17) mod 19 = 289 mod 19
5^8 mod 19 = 4

5^16 mod 19 = (5^8 * 5^8) mod 19 = (5^8 mod 19 * 5^8 mod 19) mod 19
5^16 mod 19 = (4 * 4) mod 19 = 16 mod 19
5^16 mod 19 = 16

5^32 mod 19 = (5^16 * 5^16) mod 19 = (5^16 mod 19 * 5^16 mod 19) mod 19
5^32 mod 19 = (16 * 16) mod 19 = 256 mod 19
5^32 mod 19 = 9

5^64 mod 19 = (5^32 * 5^32) mod 19 = (5^32 mod 19 * 5^32 mod 19) mod 19
5^64 mod 19 = (9 * 9) mod 19 = 81 mod 19
5^64 mod 19 = 5

3. Compute X

5^117 mod 19 = ( 5^1 * 5^4 * 5^16 * 5^32 * 5^64) mod 19
5^117 mod 19 = ( 5^1 mod 19 * 5^4 mod 19 * 5^16 mod 19 * 5^32 mod 19 * 5^64 mod 19) mod 19
5^117 mod 19 = ( 5 * 17 * 16 * 9 * 5 ) mod 19
5^117 mod 19 = 61200 mod 19 = 1
5^117 mod 19 = 1

Why does this solves your problem

You might have a A or a B which is way above the Integer limit.

Instead of summing ALL the value, and then finaly apply the modulus you can sum up to interger limit and then apply the above formula, then start again summing, and reapply formula, and so on, because 6 % 4 == (3 % 4) + (3 % 4) % 4

Anthony Raymond
  • 7,434
  • 6
  • 42
  • 59