9

I wonder, how to efficiently compute the hashCode for a BitSet-like implementation of Set<Integer>.

The BitSet#hashCode is obviously fast to compute, rather stupid(*) and incompatible with Set#hashCode().

A fast compatible implementation could go like

int hashCode() {
    int result = 0;
    for (int i=0; i<bits.length; ++i) {
        long word = bits[i];
        result += 64 * i * Long.bitCount(word) + weightedBitCount(word);
    }
    return result;
}

if there was an efficient implementation of

int weightedBitCount(long word) { // naive implementation
    int result = 0;
    for (int i=0; i<64; ++i) {
        if ((word & (1L << i)) != 0) {
            result += i;
        }
    }
    return result;
}

In case most bits are unset, the naive implementation could be improved by testing word==0 or using Long.highestOneBit or alike, but these tricks don't help much and are detrimental in other cases.

Is there a clever trick to significantly speed it up in general?

I guess, some batch computation over multiple words could be more efficient. Computing Set#size at the same time would be a nice bonus.

A note concerning premature optimization: Yes, I know. I'm mostly curious (and it can be useful for things like Project Euler).


(*) There are many bits which gets completely ignored (they get shifted out in the multiplication).

maaartinus
  • 44,714
  • 32
  • 161
  • 320
  • Your code ignores the least significant bit, even if it is set. Are you sure you want ` result += i`, not ` result += i+1`? – Sergey Kalinichenko Feb 26 '18 at 17:49
  • @dasblinkenlight Yes, I am. Bit 0 has weight 0. Adding `Integer.valueOf(0)` to a `Set` doesn't change its `hashCode()`, Adding `Integer.valueOf(64)` increases it by 64, which is covered by the first term. – maaartinus Feb 26 '18 at 20:23

3 Answers3

6

Notwithstanding hardware support (e.g., the x86 popcnt instruction), counting the bits in O(1) time is a rather well-known algorithm, generally communicated as SWAR bitcount.

However, your algorithm has a custom kernel that adds a different value based on which bit is set:

result += loop_counter_value;

Lacking a pithy algorithm for bit counting with custom kernels, a tried and true methodology is to utilize precalculated results. In this context, a lookup table. Clearly, a lookup table of all combinations of 64 bits (264 combinations!) is unwieldy, but you can split the difference by precalculating each byte of the n-byte variable. For 8 bytes, this is 256*8, or 2KiB of memory. Consider:

int weightedBitCount(long word) {
    int result = 0;
    int[][] lookups = {
        {0, 0, 1, 1, 2, 2, 3, 3, 3, 3, 4, 4, 5, 5, 6, 6, 4, 4, 5, 5, 6, 6, 7, 7, 7, 7, 8, 8, 9, 9, 10, 10, 5, 5, 6, 6, 7, 7, 8, 8, 8, 8, 9, 9, 10, 10, 11, 11, 9, 9, 10, 10, 11, 11, 12, 12, 12, 12, 13, 13, 14, 14, 15, 15, 6, 6, 7, 7, 8, 8, 9, 9, 9, 9, 10, 10, 11, 11, 12, 12, 10, 10, 11, 11, 12, 12, 13, 13, 13, 13, 14, 14, 15, 15, 16, 16, 11, 11, 12, 12, 13, 13, 14, 14, 14, 14, 15, 15, 16, 16, 17, 17, 15, 15, 16, 16, 17, 17, 18, 18, 18, 18, 19, 19, 20, 20, 21, 21, 7, 7, 8, 8, 9, 9, 10, 10, 10, 10, 11, 11, 12, 12, 13, 13, 11, 11, 12, 12, 13, 13, 14, 14, 14, 14, 15, 15, 16, 16, 17, 17, 12, 12, 13, 13, 14, 14, 15, 15, 15, 15, 16, 16, 17, 17, 18, 18, 16, 16, 17, 17, 18, 18, 19, 19, 19, 19, 20, 20, 21, 21, 22, 22, 13, 13, 14, 14, 15, 15, 16, 16, 16, 16, 17, 17, 18, 18, 19, 19, 17, 17, 18, 18, 19, 19, 20, 20, 20, 20, 21, 21, 22, 22, 23, 23, 18, 18, 19, 19, 20, 20, 21, 21, 21, 21, 22, 22, 23, 23, 24, 24, 22, 22, 23, 23, 24, 24, 25, 25, 25, 25, 26, 26, 27, 27, 28, 28},
        {0, 8, 9, 17, 10, 18, 19, 27, 11, 19, 20, 28, 21, 29, 30, 38, 12, 20, 21, 29, 22, 30, 31, 39, 23, 31, 32, 40, 33, 41, 42, 50, 13, 21, 22, 30, 23, 31, 32, 40, 24, 32, 33, 41, 34, 42, 43, 51, 25, 33, 34, 42, 35, 43, 44, 52, 36, 44, 45, 53, 46, 54, 55, 63, 14, 22, 23, 31, 24, 32, 33, 41, 25, 33, 34, 42, 35, 43, 44, 52, 26, 34, 35, 43, 36, 44, 45, 53, 37, 45, 46, 54, 47, 55, 56, 64, 27, 35, 36, 44, 37, 45, 46, 54, 38, 46, 47, 55, 48, 56, 57, 65, 39, 47, 48, 56, 49, 57, 58, 66, 50, 58, 59, 67, 60, 68, 69, 77, 15, 23, 24, 32, 25, 33, 34, 42, 26, 34, 35, 43, 36, 44, 45, 53, 27, 35, 36, 44, 37, 45, 46, 54, 38, 46, 47, 55, 48, 56, 57, 65, 28, 36, 37, 45, 38, 46, 47, 55, 39, 47, 48, 56, 49, 57, 58, 66, 40, 48, 49, 57, 50, 58, 59, 67, 51, 59, 60, 68, 61, 69, 70, 78, 29, 37, 38, 46, 39, 47, 48, 56, 40, 48, 49, 57, 50, 58, 59, 67, 41, 49, 50, 58, 51, 59, 60, 68, 52, 60, 61, 69, 62, 70, 71, 79, 42, 50, 51, 59, 52, 60, 61, 69, 53, 61, 62, 70, 63, 71, 72, 80, 54, 62, 63, 71, 64, 72, 73, 81, 65, 73, 74, 82, 75, 83, 84, 92},
        {0, 16, 17, 33, 18, 34, 35, 51, 19, 35, 36, 52, 37, 53, 54, 70, 20, 36, 37, 53, 38, 54, 55, 71, 39, 55, 56, 72, 57, 73, 74, 90, 21, 37, 38, 54, 39, 55, 56, 72, 40, 56, 57, 73, 58, 74, 75, 91, 41, 57, 58, 74, 59, 75, 76, 92, 60, 76, 77, 93, 78, 94, 95, 111, 22, 38, 39, 55, 40, 56, 57, 73, 41, 57, 58, 74, 59, 75, 76, 92, 42, 58, 59, 75, 60, 76, 77, 93, 61, 77, 78, 94, 79, 95, 96, 112, 43, 59, 60, 76, 61, 77, 78, 94, 62, 78, 79, 95, 80, 96, 97, 113, 63, 79, 80, 96, 81, 97, 98, 114, 82, 98, 99, 115, 100, 116, 117, 133, 23, 39, 40, 56, 41, 57, 58, 74, 42, 58, 59, 75, 60, 76, 77, 93, 43, 59, 60, 76, 61, 77, 78, 94, 62, 78, 79, 95, 80, 96, 97, 113, 44, 60, 61, 77, 62, 78, 79, 95, 63, 79, 80, 96, 81, 97, 98, 114, 64, 80, 81, 97, 82, 98, 99, 115, 83, 99, 100, 116, 101, 117, 118, 134, 45, 61, 62, 78, 63, 79, 80, 96, 64, 80, 81, 97, 82, 98, 99, 115, 65, 81, 82, 98, 83, 99, 100, 116, 84, 100, 101, 117, 102, 118, 119, 135, 66, 82, 83, 99, 84, 100, 101, 117, 85, 101, 102, 118, 103, 119, 120, 136, 86, 102, 103, 119, 104, 120, 121, 137, 105, 121, 122, 138, 123, 139, 140, 156},
        {0, 24, 25, 49, 26, 50, 51, 75, 27, 51, 52, 76, 53, 77, 78, 102, 28, 52, 53, 77, 54, 78, 79, 103, 55, 79, 80, 104, 81, 105, 106, 130, 29, 53, 54, 78, 55, 79, 80, 104, 56, 80, 81, 105, 82, 106, 107, 131, 57, 81, 82, 106, 83, 107, 108, 132, 84, 108, 109, 133, 110, 134, 135, 159, 30, 54, 55, 79, 56, 80, 81, 105, 57, 81, 82, 106, 83, 107, 108, 132, 58, 82, 83, 107, 84, 108, 109, 133, 85, 109, 110, 134, 111, 135, 136, 160, 59, 83, 84, 108, 85, 109, 110, 134, 86, 110, 111, 135, 112, 136, 137, 161, 87, 111, 112, 136, 113, 137, 138, 162, 114, 138, 139, 163, 140, 164, 165, 189, 31, 55, 56, 80, 57, 81, 82, 106, 58, 82, 83, 107, 84, 108, 109, 133, 59, 83, 84, 108, 85, 109, 110, 134, 86, 110, 111, 135, 112, 136, 137, 161, 60, 84, 85, 109, 86, 110, 111, 135, 87, 111, 112, 136, 113, 137, 138, 162, 88, 112, 113, 137, 114, 138, 139, 163, 115, 139, 140, 164, 141, 165, 166, 190, 61, 85, 86, 110, 87, 111, 112, 136, 88, 112, 113, 137, 114, 138, 139, 163, 89, 113, 114, 138, 115, 139, 140, 164, 116, 140, 141, 165, 142, 166, 167, 191, 90, 114, 115, 139, 116, 140, 141, 165, 117, 141, 142, 166, 143, 167, 168, 192, 118, 142, 143, 167, 144, 168, 169, 193, 145, 169, 170, 194, 171, 195, 196, 220},
        {0, 32, 33, 65, 34, 66, 67, 99, 35, 67, 68, 100, 69, 101, 102, 134, 36, 68, 69, 101, 70, 102, 103, 135, 71, 103, 104, 136, 105, 137, 138, 170, 37, 69, 70, 102, 71, 103, 104, 136, 72, 104, 105, 137, 106, 138, 139, 171, 73, 105, 106, 138, 107, 139, 140, 172, 108, 140, 141, 173, 142, 174, 175, 207, 38, 70, 71, 103, 72, 104, 105, 137, 73, 105, 106, 138, 107, 139, 140, 172, 74, 106, 107, 139, 108, 140, 141, 173, 109, 141, 142, 174, 143, 175, 176, 208, 75, 107, 108, 140, 109, 141, 142, 174, 110, 142, 143, 175, 144, 176, 177, 209, 111, 143, 144, 176, 145, 177, 178, 210, 146, 178, 179, 211, 180, 212, 213, 245, 39, 71, 72, 104, 73, 105, 106, 138, 74, 106, 107, 139, 108, 140, 141, 173, 75, 107, 108, 140, 109, 141, 142, 174, 110, 142, 143, 175, 144, 176, 177, 209, 76, 108, 109, 141, 110, 142, 143, 175, 111, 143, 144, 176, 145, 177, 178, 210, 112, 144, 145, 177, 146, 178, 179, 211, 147, 179, 180, 212, 181, 213, 214, 246, 77, 109, 110, 142, 111, 143, 144, 176, 112, 144, 145, 177, 146, 178, 179, 211, 113, 145, 146, 178, 147, 179, 180, 212, 148, 180, 181, 213, 182, 214, 215, 247, 114, 146, 147, 179, 148, 180, 181, 213, 149, 181, 182, 214, 183, 215, 216, 248, 150, 182, 183, 215, 184, 216, 217, 249, 185, 217, 218, 250, 219, 251, 252, 284},
        {0, 40, 41, 81, 42, 82, 83, 123, 43, 83, 84, 124, 85, 125, 126, 166, 44, 84, 85, 125, 86, 126, 127, 167, 87, 127, 128, 168, 129, 169, 170, 210, 45, 85, 86, 126, 87, 127, 128, 168, 88, 128, 129, 169, 130, 170, 171, 211, 89, 129, 130, 170, 131, 171, 172, 212, 132, 172, 173, 213, 174, 214, 215, 255, 46, 86, 87, 127, 88, 128, 129, 169, 89, 129, 130, 170, 131, 171, 172, 212, 90, 130, 131, 171, 132, 172, 173, 213, 133, 173, 174, 214, 175, 215, 216, 256, 91, 131, 132, 172, 133, 173, 174, 214, 134, 174, 175, 215, 176, 216, 217, 257, 135, 175, 176, 216, 177, 217, 218, 258, 178, 218, 219, 259, 220, 260, 261, 301, 47, 87, 88, 128, 89, 129, 130, 170, 90, 130, 131, 171, 132, 172, 173, 213, 91, 131, 132, 172, 133, 173, 174, 214, 134, 174, 175, 215, 176, 216, 217, 257, 92, 132, 133, 173, 134, 174, 175, 215, 135, 175, 176, 216, 177, 217, 218, 258, 136, 176, 177, 217, 178, 218, 219, 259, 179, 219, 220, 260, 221, 261, 262, 302, 93, 133, 134, 174, 135, 175, 176, 216, 136, 176, 177, 217, 178, 218, 219, 259, 137, 177, 178, 218, 179, 219, 220, 260, 180, 220, 221, 261, 222, 262, 263, 303, 138, 178, 179, 219, 180, 220, 221, 261, 181, 221, 222, 262, 223, 263, 264, 304, 182, 222, 223, 263, 224, 264, 265, 305, 225, 265, 266, 306, 267, 307, 308, 348},
        {0, 48, 49, 97, 50, 98, 99, 147, 51, 99, 100, 148, 101, 149, 150, 198, 52, 100, 101, 149, 102, 150, 151, 199, 103, 151, 152, 200, 153, 201, 202, 250, 53, 101, 102, 150, 103, 151, 152, 200, 104, 152, 153, 201, 154, 202, 203, 251, 105, 153, 154, 202, 155, 203, 204, 252, 156, 204, 205, 253, 206, 254, 255, 303, 54, 102, 103, 151, 104, 152, 153, 201, 105, 153, 154, 202, 155, 203, 204, 252, 106, 154, 155, 203, 156, 204, 205, 253, 157, 205, 206, 254, 207, 255, 256, 304, 107, 155, 156, 204, 157, 205, 206, 254, 158, 206, 207, 255, 208, 256, 257, 305, 159, 207, 208, 256, 209, 257, 258, 306, 210, 258, 259, 307, 260, 308, 309, 357, 55, 103, 104, 152, 105, 153, 154, 202, 106, 154, 155, 203, 156, 204, 205, 253, 107, 155, 156, 204, 157, 205, 206, 254, 158, 206, 207, 255, 208, 256, 257, 305, 108, 156, 157, 205, 158, 206, 207, 255, 159, 207, 208, 256, 209, 257, 258, 306, 160, 208, 209, 257, 210, 258, 259, 307, 211, 259, 260, 308, 261, 309, 310, 358, 109, 157, 158, 206, 159, 207, 208, 256, 160, 208, 209, 257, 210, 258, 259, 307, 161, 209, 210, 258, 211, 259, 260, 308, 212, 260, 261, 309, 262, 310, 311, 359, 162, 210, 211, 259, 212, 260, 261, 309, 213, 261, 262, 310, 263, 311, 312, 360, 214, 262, 263, 311, 264, 312, 313, 361, 265, 313, 314, 362, 315, 363, 364, 412},
        {0, 56, 57, 113, 58, 114, 115, 171, 59, 115, 116, 172, 117, 173, 174, 230, 60, 116, 117, 173, 118, 174, 175, 231, 119, 175, 176, 232, 177, 233, 234, 290, 61, 117, 118, 174, 119, 175, 176, 232, 120, 176, 177, 233, 178, 234, 235, 291, 121, 177, 178, 234, 179, 235, 236, 292, 180, 236, 237, 293, 238, 294, 295, 351, 62, 118, 119, 175, 120, 176, 177, 233, 121, 177, 178, 234, 179, 235, 236, 292, 122, 178, 179, 235, 180, 236, 237, 293, 181, 237, 238, 294, 239, 295, 296, 352, 123, 179, 180, 236, 181, 237, 238, 294, 182, 238, 239, 295, 240, 296, 297, 353, 183, 239, 240, 296, 241, 297, 298, 354, 242, 298, 299, 355, 300, 356, 357, 413, 63, 119, 120, 176, 121, 177, 178, 234, 122, 178, 179, 235, 180, 236, 237, 293, 123, 179, 180, 236, 181, 237, 238, 294, 182, 238, 239, 295, 240, 296, 297, 353, 124, 180, 181, 237, 182, 238, 239, 295, 183, 239, 240, 296, 241, 297, 298, 354, 184, 240, 241, 297, 242, 298, 299, 355, 243, 299, 300, 356, 301, 357, 358, 414, 125, 181, 182, 238, 183, 239, 240, 296, 184, 240, 241, 297, 242, 298, 299, 355, 185, 241, 242, 298, 243, 299, 300, 356, 244, 300, 301, 357, 302, 358, 359, 415, 186, 242, 243, 299, 244, 300, 301, 357, 245, 301, 302, 358, 303, 359, 360, 416, 246, 302, 303, 359, 304, 360, 361, 417, 305, 361, 362, 418, 363, 419, 420, 476}
    };
    for (int bite = 0; bite < 8; bite++)
        result += lookups[bite][ (int)(word >> (bite * 8)) & 0xff ];
    return result;
}

You might clean that up a bit, move the initialization out of the function, and so on, but this crucially removes the branch from your loop, and reduces your best case of 128 instructions (for all zeros) to 56 instructions (rough numbers). The worst case is a bit more pronounced at 192 instructions to 56. Further, a smart compiler might unroll the loop entirely, reducing to 40 instructions.

hunteke
  • 3,648
  • 1
  • 7
  • 17
  • Note that `popcnt` is available as `Long#bitCount`. – maaartinus Feb 25 '18 at 06:03
  • 1
    Make it a one-dimensional array to improve cpu cache performance. – Mark Jeronimus Feb 27 '18 at 09:14
  • @MarkJeronimus you'll have to educate me on how a _static_ one-dimensional array would be laid out in memory differently than an equivalent _static_ two-dimensional one. I would naively think they'd occupy the exact same space have the exact same cache-line performance? Note that I'm specifically _not_ talking about dynamically allocated arrays. – hunteke Feb 27 '18 at 13:22
  • 2
    A two dimensional array is internally coded as an array of arrays. Each of the 9 arrays can have a different location in the heap. And heck, I just noticed you instantiate 9 arrays for *every* call of the method. Better make it a class constant. – Mark Jeronimus Feb 28 '18 at 12:20
  • @MarkJeronimus :-) Yes, every instantiation. (But coupled with the text "_You might clean that up a bit, move the initialization out of the function..._".) 9 arrays ... as opposed to compiler sugar, eh? Things have clearly changed since I was bit banging these details years ago, or perhaps Java does this low-level detail in way I don't expect. Thanks for the info. – hunteke Feb 28 '18 at 13:21
  • Without targeting a specific JRE (e.g. Oracle 1.8.0_152), operating system and CPU model *and* knowing for sure how the JRE compiles the byte code to assembler, it is pretty pointless trying to do hardware optimizations in Java source code. On my system, using a one-dimensional lookup table is *slower* than a two-dimensional table, probably because dereferencing an extra pointer in a 2D table is cheaper than the extra calcualation needed to find the index in a 1D table. Any sane CPU will make sure that both varieties are cached, so that is not an argument against using a 2D table. – jarnbjo Mar 02 '18 at 11:02
2

I think is also important to have less hash collisions together with the hashing performance. Faster hashing calculation can make your program generally slower, because of big amount of hash misses.

It mind be a better idea to use some generic hash function like MurMur3A from Google Guava, instead of inventing your own.

There are many Benchmark about hashing, for example:

I think you can do some micro-benchmarking using a Google Caliper and check which hash function is better for you case.

BTW. Ask your self why do you need a custom BitSet ?

Victor Gubin
  • 2,782
  • 10
  • 24
0

This is what I did:

int weightedBitCount(long word) {
       return (Long.bitCount(word & 0xFFFF_FFFF_0000_0000L) << 5)
            + (Long.bitCount(word & 0xFFFF_0000_FFFF_0000L) << 4)
            + (Long.bitCount(word & 0xFF00_FF00_FF00_FF00L) << 3)
            + (Long.bitCount(word & 0xF0F0_F0F0_F0F0_F0F0L) << 2)
            + (Long.bitCount(word & 0xCCCC_CCCC_CCCC_CCCCL) << 1)
            + (Long.bitCount(word & 0xAAAA_AAAA_AAAA_AAAAL) << 0);
}

It's pretty simple: With a single bit set, e.g., the bit 10, word looks like 0x0000_0000_0000_0400L and only the masks 0xFF00_FF00_FF00_FF00L and 0xCCCC_CCCC_CCCC_CCCCL produce a bit count of 1, so we get

(0 << 5) + (0 << 4) + (1 << 3) + (0 << 2) + (1 << 1) + (0 << 5) = 10

It needs some 6*4 instructions (maybe 6 cycles on modern Intel) per 64 bits, so it's not really slow, but it's still too slow when compared with the bulk bitset operations which need a single instruction (per 64 bits).

So I'm playing with some batch computation over multiple words.

maaartinus
  • 44,714
  • 32
  • 161
  • 320