2

I've wrote an implementation of Murmur3 hash and have defined the 128-bit keys as hash128_t

typedef struct
{
    uint64_t p1;
    uint64_t p2;
} hash128_t;

I'm trying to write my own hashmap using these keys, but I'm not exactly sure how to do arithmetic with a struct nor a number this large. I need help specifically with modulo operations, but help on how to treat these two blocks of memory as one integer would be greatly appreciated.

My modulo formula is

Dividend - ((Dividend/Divisor) * Divisor)
Brett
  • 159
  • 1
  • 8
  • About different representation of 128 bit number you can see http://stackoverflow.com/questions/1188939/representing-128-bit-numbers-in-c. – Mihai8 Jun 19 '15 at 02:16
  • If you are using gcc __uint128_t is available for 64 bit architecture – Filip Bulovic Jun 19 '15 at 02:26
  • @user1929959: most of that answer is related to C++ more than C, though [GMP](http://gmplib.org/) is primarily C with a C++ interface, as I understand it. – Jonathan Leffler Jun 19 '15 at 02:26
  • On the one hand, there are plenty of prewritten multiprecision libraries you could use. But on the other hand, writing your own is a great exercise. I can't remember all the details from the last time I did it, but in most respects it can be just like the arithmetic you learned in school. – Steve Summit Jun 19 '15 at 02:38
  • @FilipBulovic This also needs to run on 32-bit platforms. This is mainly for a learning experience. There are plenty of C libraries that have hashmaps, but I wanted to learn to do it. – Brett Jun 19 '15 at 02:40
  • @SteveSummit Does that mean I should go through the number digit-by-digit? That sounds extremely inefficient, especially as modulo consists of division, multiplication and subtraction. – Brett Jun 19 '15 at 02:50
  • 1
    Digit-by-digit, yes, although not necessarily digits in base 10. Another popular choice is 16. (Me, I used base 2147483648.) It's not necessarily unnecessarily inefficient. When you do long division, you get the quotient and remainder at the same time; you don't have to multiply and subtract. See also the links at the right, like http://stackoverflow.com/questions/6952424/algorithm . Or if you care about efficiency, that definitely tilts the balance towards using someone else's, presumably optimized library. – Steve Summit Jun 19 '15 at 03:56

1 Answers1

1

The easiest thing you can do is to simply discard half of the hash and do modulo (simply using %) on the other half.

The next simplest thing to do is to use an existing bignum library.

If you want to use the whole hash, though, you need to do a long division here. There are a number of ways to actually do this; here's one (completely untested) method:

uint64_t big_modulo(hash128_t hash, uint64_t divisor) {
    uint64_t accum = hash.p1 % divisor;
    uint64_t shift_buf = hash.p2;

    for (int i = 0; i < 64; i++) {
      accum = accum << 1 | (shift_buf >> 63);
      shift_buf = shift_buf << 1;
      if (accum & (~1LU << 63)) accum = accum % divisor;
    }

    return accum % divisor;
}

Note that this will break if divisor is larger than 2^63 - 1. Fixing this is left as an exercise to the reader, as are various optimizations that are possible if you can constrain the divisor to a 32-bit space.

bdonlan
  • 224,562
  • 31
  • 268
  • 324
  • Thank you for the amazing start! This is actually what I was looking for, I was completely lost as to which direction to go. – Brett Jun 19 '15 at 03:57