How to implement own hashing function for strings?

Question

So this is the default algorithm that generates the hashcode for Strings:

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

However, I wanna use something different and much more simple like adding the ASCII values of each character and then adding them all up.

How do I make it so that it uses the algorithm I created, instead of using the default one when I use the put() method for hashtables?

As of now I don't know what to do other than implementing a hash table from scratch.

Why exactly do you want to change the hash function? Your algorithm sounds like it will create lots of collisions and this hurts performace for any hash based structure. — Chaosfire, Dec 03 '22 at 17:36

Albina · Answer 1 · 2022-12-04T06:32:35.937

Create a new class, and use String type field in it. For example:

public class MyString {
    private final String value;

    public MyString(String value) {
        this.value = value;
    }

    public String getValue() {
        return value;
    }

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) return false;
        MyString myString = (MyString) o;
        return Objects.equals(value, myString.value);
    }

    @Override
    public int hashCode() {
        // use your own implementation
        return value.codePoints().sum();
    }
}

Add equals() and hashCode() methods with @Override annotation. Note: here hashCode() operates only with ASCII values.

After that, you will be able to use new class objects in the desired data structure. Here you can find a detailed explanation of these methods and a contract between equals() and hashCode().

Well, yeah, you *can* do it. But it's a horrifically bad idea. — Jim Mischel, Jan 31 '23 at 04:03

score 1 · Answer 2 · answered Dec 05 '22 at 19:02

However, I wanna use something different and much more simple like adding the ASCII values of each character and then adding them all up.

This is an extremely bad idea if you care at all about hash table efficiency. What you're thinking of as an overly-complicated hashing function is actually designed to give a uniform distribution of hash values throughout the entire 32-bit (or whatever) range. That gives the best possibility of uniformly distributing the hash keys (after you mod by the hash table size) in your buckets.

Your simple method of adding up the ASCII values of the individual characters has multiple flaws. First, you're limited in the range of values you can reasonably expect to generate. The highest value you can create is 255*n, where n is the length of the key. If your key is 10 characters in length, then you can't possibly generate more than 2,550 unique hash values. But there are 255^10 possible 10-character strings. Your collision rate will be very high.

The second problem is that anagrams generate the same hash value. "stop," "spot," and "tops" all generate the same hash value and will hash to the same bucket. Again, this will greatly affect your collision rate.

It's unclear to me why you want to replace the hashing function. If you're thinking it will result in better performance, you should think again. Sure, it will make generating the hash value faster, but it will result in very skewed key distribution, and correspondingly terrible hash table performance.

How to implement own hashing function for strings?

2 Answers2