3

I'm building an hash function which should map any String (max length 100 characters) to a single [A-Z] character (I'm using it for sharding purposes).

I came up with this simple Java function, is there any way to make it faster?

public static final char stringToChar(final String s) {
    long counter = 0;
    for (char c : s.toCharArray()) {
        counter += c;
    }
    return (char)('A'+(counter%26));
}
faican
  • 45
  • 5
  • 3
    Why not just take the first character of the string? What's the benefit of calculating over all characters? Do you expect a very uneven distribution when using only the first character? – Robby Cornelissen Aug 06 '20 at 07:23
  • @RobbyCornelissen as you correctly guessed, knowing the general format of the string I'm expecting an uneven distribution – faican Aug 06 '20 at 07:43
  • If you want to make it faster, then don't use toCharArray, but retrieve the characters using the charAt method – Erwin Bolwidt Aug 06 '20 at 09:48
  • @RobbyCornelissen My question was regarding a performance optimization, not the output distribution per se – faican Aug 06 '20 at 10:02

1 Answers1

6

A quick trick to have an even distribution of the "shards" is using an hash function.

I suggest this method that uses the default java String.hashCode() function

public static char getShardLabel(String string) {
    int hash = string.hashCode();
    // using Math.flootMod instead of operator % beacause '%' can produce negavive outputs
    int hashMod = Math.floorMod(hash, 26);
    return (char)('A'+(hashMod));
}

As pointed out here this method is considered "even enough".

Based on a quick test it looks faster than the solution you suggested.
On 80kk strings of various lengths:

  • getShardLabel took 65 milliseconds
  • stringToChar took 571 milliseconds
Pado
  • 1,497
  • 16
  • 28