5

I have to use an x86 32 bits murmurhash to determinate the partition in which I send messages in Kafka. Another application is using NodeJS murmurhash.v3() method to get the messages from the expected partition.

I tried two methods :

  1. First, I got the Java class from https://svn.apache.org/repos/asf/mahout/trunk/math/src/main/java/org/apache/mahout/math/MurmurHash3.java
  2. I also tried to translate the JS code of NodeJS murmurhash.v3() in Java (N to A column in the table below)

Here is the code I use to get values from Apache java method :

int ret = MurmurHash3.MurmurHashV3(key, new Long(KAFKA_PARTITION_SEED).intValue());

Note: at present time, KAFKA_PARTITION_SEED = 100 but it's just a test value. It will be a Long value in the future.

Here is the code I have done, translating from NodeJS to Java :

    static int MurmurHashV3(String key, int seed) {
    int remainder;
    int bytes;
    int h1;
    int h1b;
    int c1;
    int c2;
    int k1;
    int i;

    remainder = key.length() & 3; // key.length % 4
    bytes = key.length() - remainder;
    h1 = seed;
    c1 = 0xcc9e2d51;
    c2 = 0x1b873593;
    i = 0;

    while (i < bytes) {
        k1 = ((key.charAt(i) & 0xff)) | ((key.charAt(++i) & 0xff) << 8)
                | ((key.charAt(++i) & 0xff) << 16)
                | ((key.charAt(++i) & 0xff) << 24);
        ++i;

        k1 = ((((k1 & 0xffff) * c1) + ((((k1 >>> 16) * c1) & 0xffff) << 16))) & 0xffffffff;
        k1 = (k1 << 15) | (k1 >>> 17);
        k1 = ((((k1 & 0xffff) * c2) + ((((k1 >>> 16) * c2) & 0xffff) << 16))) & 0xffffffff;

        h1 ^= k1;
        h1 = (h1 << 13) | (h1 >>> 19);
        h1b = ((((h1 & 0xffff) * 5) + ((((h1 >>> 16) * 5) & 0xffff) << 16))) & 0xffffffff;
        h1 = (((h1b & 0xffff) + 0x6b64) + ((((h1b >>> 16) + 0xe654) & 0xffff) << 16));
    }

    k1 = 0;

    switch (remainder) {
    case 3:
        k1 ^= (key.charAt(i + 2) & 0xff) << 16;
    case 2:
        k1 ^= (key.charAt(i + 1) & 0xff) << 8;
    case 1:
        k1 ^= (key.charAt(i) & 0xff);

        k1 = (((k1 & 0xffff) * c1) + ((((k1 >>> 16) * c1) & 0xffff) << 16)) & 0xffffffff;
        k1 = (k1 << 15) | (k1 >>> 17);
        k1 = (((k1 & 0xffff) * c2) + ((((k1 >>> 16) * c2) & 0xffff) << 16)) & 0xffffffff;
        h1 ^= k1;
    }

    h1 ^= key.length();

    h1 ^= h1 >>> 16;
    h1 = (((h1 & 0xffff) * 0x85ebca6b) + ((((h1 >>> 16) * 0x85ebca6b) & 0xffff) << 16)) & 0xffffffff;
    h1 ^= h1 >>> 13;
    h1 = ((((h1 & 0xffff) * 0xc2b2ae35) + ((((h1 >>> 16) * 0xc2b2ae35) & 0xffff) << 16))) & 0xffffffff;
    h1 ^= h1 >>> 16;

    return h1 >>> 0;
}

In both cases I get the same results when trying to get the partition value. The partition value (P in the table below) is the modulo 8 (%8) of the murmurhash method returned value.

Here is a example of the result I get :

        KEY          |    NodeJS     | P |     Apache     | P |    N to A         |  P | SAME

0009B5192951 | 1285784451 | 3 |  1285784451 |  3 |  1285784451 |   3 | TRUE

0009B5192953 | 2252321193 | 1 | -2042646103 | -7 | -2042646103 | -7 | FALSE

0009B5192979 |   973658619 | 3 |    973658619 |   3 |    973658619 |  3 | TRUE

0009B5192985 | 1359432313 | 1 |  1359432313 |   1 |  1359432313 |  1 | TRUE

0009B5192987 | 3551230334 | 6 |   -743736962 |  -2 |  -743736962 | -2 | FALSE

0009B5192995 |   199863683 | 3 |    199863683 |   3 |    199863683 |  3 | TRUE

0009B5193001 | 1660947343 | 7 |  1660947343 |   7 |  1660947343 |  7 | TRUE

0009B5193007 | 1980598253 | 5 |  1980598253 |   5 |  1980598253 |  5 | TRUE

0009B5203789 | 1358113422 | 6 |  1358113422 |   6 |  1358113422 |  6 | TRUE

0009B5203791 | 1339226023 | 7 |  1339226023 |   7 |  1339226023 |  7 | TRUE

As you can see, in some cases, the Apache murmurhash method returns a negative value, which is not expected (I guess).

Can anyone tell me what I am doing wrong ?

  • I forgot to precise the way I use the original java class : `int ret = MurmurHash3.murmurhash3x8632(key.getBytes(), 0, key.length(), new Long(KAFKA_PARTITION_SEED).intValue());` Do you think the error can come from the getBytes() method ? – Cédric VIDREQUIN Apr 14 '14 at 11:24

3 Answers3

0

I was facing the same problem with MurmurHash2 for a while, but it turns out that the Apache implementations are bugged because of the way Java handles signedness. I would recommend using this instead.

0

Kafka Streams is using murmur hash 3 from the github repo we see this implementation , you might want to use this

https://github.com/apache/kafka/blob/99b9b3e84f4e98c3f07714e1de6a139a004cbc5b/streams/src/main/java/org/apache/kafka/streams/state/internals/Murmur3.java

/**

Ran Lupovich
  • 1,655
  • 1
  • 6
  • 13
0

MurmurHash

I have wrote a simple util to produce only positive murmurhash3 32bit hash.

It was tested in a limited test data, and the result is the same as the lastguest\Murmur.

Maybe it fits your requirement or you can hack it as you wish.

Ninja
  • 2,479
  • 3
  • 23
  • 32