C++ hashCode - working with large numbers (+ mod operation)

Question

I'm working on a project that involves programming the Java hashCode() hashing function,
the formula.

The input strings (i.e. "A Adding trigger unit C") were randomly generated and stored in a .txt file. Also if the result hash is not within a range (0 < hash < N), it needs to be adjusted => hash % N).

I'm having trouble with the hashCode() algorithm, since the result for just one char of the string is too large (i.e. 1.408 * 10^30) to be stored in a regular variable. I've tried using some libraries that allow you to store very big numbers, but none of them allow you to do the mod operation, if the hash exceeds the N parameter.

Please, what solution would you recommend for me so I could store these very long numbers + use the mod operation on them.

Thanks

Have you tried http://en.cppreference.com/w/cpp/string/basic_string/hash. the result can be stored in size_t. — gchen, May 04 '18 at 19:22
I haven't tried using it until you mentioned it, but unfortunately it doesn't help my problem. size_t neither doesn't allow you to store numbers as big as the ones I'm working with If you have any other idea, hmu. Thanks — Revokenz, May 04 '18 at 19:43
Yes I do. But based on the algorithm on the wiki page for hashCode() the numbers are too big. For example the value just for the first character for string that I mentioned would be: 65 * 31^(20-1) = 1.408593044298 * 10^30 which is a value that can't be stored even in the size_t as you suggested. — Revokenz, May 04 '18 at 19:48
I see. But you cant use the formula directly, you need to %N before it's getting too large. — gchen, May 04 '18 at 19:53
Yes, that would probably work but then it would be violating the project terms. The %N can only be used for the final hash if it exceeds the N value. Maybe there's no other way around it and it'd be the one and only solution to this problem. So what you are saying is to use the %N on the result of the pow function? — Revokenz, May 04 '18 at 19:56
...It doesn't matter because the final result will be the same. (A+B)%N == (A%N+B%N)%N — gchen, May 04 '18 at 20:02

gchen · Answer 1 · 2018-05-04T21:13:24.440

You can't store numbers that large in C++, but it's certainly possible as long as N isn't a gigantic number. The trick is to perform % N as you loop through the string, so your number will never get overflowed.

You can use this method -- https://math.stackexchange.com/a/453108 to find (A^B)%N, i.e. (31^200)%N.

//This is the algorithm mentioned in the above link
const size_t N = 1e9 + 7;
size_t bigPower(size_t x, size_t p) {
    size_t ans = 1;

    while(p > 0){
        if(p % 2 == 1)
            ans = (ans*x)%N;
        x = (x*x)%N;
        p /= 2;
    }

return ans;
}

Then, follow the formula should not overflow the numbers.

size_t hashCode(const string& s) {
    size_t result = 0;
    for(size_t i = 0; i< s.size(); i++) {
        result += (s[i] * bigPower(31, s.size()-1-i))%N;
        result %= N;
    }

    return result;
}

UPDATE I found Java hashCode can actually overflow and give back negative hash code --HashCode giving negative values. Thus, if you want your hashCode to behave like Java, you may as well as allow overflows.

//This might produce the exact same hashCode as Java.
int hashCode(const string& s) {
    int result = 0;
    for(int i = 0; i< s.size(); i++) {
        int m = 1;
        for(int j=0; j<s.size()-1-i; j++) {
            m *= 31;
        }
        result += (s[i] * m);
    }

    return result;
}

I think I finally figured it out. For some reason it wasn't working with the regular double but I managed to fix it. Thanks for the suggestion, really do appreciate you putting your time into this — Revokenz, May 04 '18 at 21:42

C++ hashCode - working with large numbers (+ mod operation)

1 Answers1