Whether a String
has high variability at the beginning of the string vs at the end of the string doesn't matter.
To test this, the below code simulates the hash-table logic of Java 8's HashMap
class. The methods tableSizeFor
and hash
were copied from the JDK source code.
The code will create 60 character strings that differ only by the first or last 7 characters. It will then build a hash-table with appropriate capacity and count the number of hash bucket collisions.
As can be seen in the output, the collision counts are the same (within statistical margins), regardless of leading or trailing variability of the strings being hashed.
Output
Count: 1000 Collisions: 384 By collision size: {1=240, 2=72}
Count: 1000 Collisions: 278 By collision size: {1=191, 2=30, 3=3, 4=3, 6=1}
Count: 100000 Collisions: 13876 By collision size: {1=12706, 2=579, 3=4}
Count: 100000 Collisions: 15742 By collision size: {1=12644, 2=1378, 3=110, 4=3}
Count: 10000000 Collisions: 2705759 By collision size: {1=1703714, 2=381705, 3=65050, 4=9417, 5=1038, 6=101, 7=3}
Count: 10000000 Collisions: 2626728 By collision size: {1=1698957, 2=365663, 3=56156, 4=6278, 5=535, 6=27, 7=4}
Test Code
public class Test {
public static void main(String[] args) throws Exception {
//
test(1000, "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_%07d");
test(1000, "%07d_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
test(100000, "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_%07d");
test(100000, "%07d_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
test(10000000, "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz_%07d");
test(10000000, "%07d_ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz");
}
private static void test(int count, String format) {
// Allocate hash-table
final int initialCapacity = count * 4 / 3 + 1;
final int tableSize = tableSizeFor(initialCapacity);
int[] tab = new int[tableSize];
// Build strings, calculate hash bucket, and increment bucket counter
for (int i = 0; i < count; i++) {
String key = String.format(format, i);
int hash = hash(key);
int bucket = (tableSize - 1) & hash;
tab[bucket]++;
}
// Collect collision counts, i.e. counts > 1
// E.g. a bucket count of 3 means 1 original value plus 2 collisions
int total = 0;
Map<Integer, AtomicInteger> collisions = new TreeMap<>();
for (int i = 0; i < tableSize; i++)
if (tab[i] > 1) {
total += tab[i] - 1;
collisions.computeIfAbsent(tab[i] - 1, c -> new AtomicInteger()).incrementAndGet();
}
// Print result
System.out.printf("Count: %-8d Collisions: %-7d By collision size: %s%n", count, total, collisions);
}
static final int MAXIMUM_CAPACITY = 1 << 30;
static final int tableSizeFor(int cap) {
int n = cap - 1;
n |= n >>> 1;
n |= n >>> 2;
n |= n >>> 4;
n |= n >>> 8;
n |= n >>> 16;
return (n < 0) ? 1 : (n >= MAXIMUM_CAPACITY) ? MAXIMUM_CAPACITY : n + 1;
}
static final int hash(Object key) {
int h;
return (key == null) ? 0 : (h = key.hashCode()) ^ (h >>> 16);
}
}