3

I'm building a custom hash where I sum all the letters in string according to formula:

string[0] * 65536 + string[1] * 32768 + string[2] * 16384 + ...

And I've came to a problem whether I should have these numbers defined as constants in int array like this:

const int MULTIPLICATION[] = {
    65536,
    32768,
    16384,
    8192,
    4096,
    2048,
    1024,
    512,
    256,
    128,
    64,
    32,
    16,
    8,
    4,
    2,
    1
}

Or, maybe I should just generate these numbers while counting hash itself (while probably losing some speed due to them not being generated already)? I'll need to count this hash millions of times and the main thing I want compiler to understand is that instead of normal MUL operation

MOV EBX, 8
MUL EBX

it would do

SHL EAX, 3

Does compiler understand that if I'm multiplying by power of 2 to shift bits instead of usual multiplication?

Another question, I'm pretty sure it does shift bits when you write in c++ number *= 2; But just to clarify, does it?


Thanks, I've found out how to view dissasembly in debugger. Yes, compiler does understand to shift bits if you use it like

number *= 65536

However, it does normal multiplication if you do

number1 = 65536
number *= number1;
Vanilla Face
  • 908
  • 1
  • 7
  • 17
  • 1
    You can find answers to questions such as these by familiarizing yourself with a debugger. Use it to break on code you wonder about, and then inspect the disassembly at that point. :) – Magnus Hoff Dec 18 '12 at 13:54
  • why don't you ask your compiler?! – Kerrek SB Dec 18 '12 at 13:54
  • 1
    There is no fixed answer to "will the compiler replace multiplication with shift"; this is dependent on your compiler, architecture, and possibly compiler flags you use. – mah Dec 18 '12 at 13:54
  • 4
    Note also, there's a lot of text in your question that has absolutely nothing to do with your question. You're mentioning things like hashes (and spending a fair amount of time doing it) which do not enable anyone to help you more efficiently... if you're not already aware, please understand that doing this makes it less likely you can receive a good answer. – mah Dec 18 '12 at 13:56
  • 2
    Multiplication by two might shift bits or not, that is compiler-dependent. But you can use the shift operator yourself to generate the multiples of two, like this: `int i=1; i<<1;` Also look into the other bit operations of C++, probably they can help you as well. – Zane Dec 18 '12 at 13:57
  • Work backwards through the string adding and multiplying by 2. – Peter Wood Dec 18 '12 at 14:07

6 Answers6

5

Try it!

What compiler are you using? You can tell most compilers to leave the intermediate files in place after compilation, or to just compile (and not assemble), so you can actually look at the assembly code it generated.

You can see on this other question of mine that this is just what I did.

For example, in gcc the -S flag means "compile only". And -masm=intel generates more readable assembly, IMO.


Edit

That all said, I think the following is the algorithm you're looking for (untested):

// Rotate right by n bits
#define ROR(a, n)   ((a >> n) | (a << (sizeof(a)*8-n)))


int custom_hash(const char* str, int len) {
    int hash = 0;
    int mult = 0x10000;  // 65536, but more obvious

    for (int i=0; i<len; i++) {
        hash += str[i] * mult;
        mult = ROR(mult, 1);    
    }

    return mult;
}

First of all, you didn't specify what happens when you have more than 16 characters (what is the multiplier?) So in this implementation, I used a bitwise rotate. x86 has bitwise rotate instructions (ror and rol for rotate right and left, respectively). However, C provides no way of expressing a rotate operation. So I define the ROR macro which does the rotate for you. (Understanding how it works is left as an exercise for the reader!)

In my loop, I start the multiplier at 0x10000 (65536) like you did. Each iteration of the loop, I rotate the multiplier right by one bit. This essentially divides it by two, until you get to 1, after which it becomes 0x80000000.

Community
  • 1
  • 1
Jonathon Reinhart
  • 132,704
  • 33
  • 254
  • 328
  • 2
    @VanillaFace On VC++, the option is `/Fa`, followed by an optional filename (without a space). In Visual Studios, in the properties of the source file, under C/C++→Output Files, there is an entry for Assembler Output and for ASM List Location. – James Kanze Dec 18 '12 at 14:07
3

The answer depends on your compiler, hardware architecture, and possibly other things.

It is not even obvious a priori that replacing such multiplication with a shift is the optimal thing to do. I think that generally one should let instruction-level optimizations to the compiler.

That said, let's see what my compiler does :)

int i, j;

int main() {
  j = i * 8;
}

This, compiled using gcc 4.7.2 with -O3, results in

_main:
LFB0:
        movq    _i@GOTPCREL(%rip), %rax
        movl    (%rax), %edx
        movq    _j@GOTPCREL(%rip), %rax
        sall    $3, %edx                  ;<<<<<<<<<< THE SHIFT INSTRUCTION
        movl    %edx, (%rax)
        ret

So, in my environment, the answer is clearly "yes".

As to your other question, don't precompute MULTIPLICATION. To get the coefficients in

string[0] * 65536 + string[1] * 32768 + string[2] * 16384 + ...

just start with coeff = 65536 and shift it one bit to the right every iteration:

coeff >>= 1;
NPE
  • 486,780
  • 108
  • 951
  • 1,012
  • Assuming you use the same compiler, there are a lot of C++ compilers, *yours* happens to do this, another may not. – Hunter McMillen Dec 18 '12 at 13:56
  • @HunterMcMillen: Of course. Please see the first sentence of my (updated) answer. – NPE Dec 18 '12 at 13:57
  • If you'd made i `char`, that would have revealed the possibly surprising `movsbl` (I mean the results might be surprising; it's not surprising that the compiler does that.) – rici Dec 18 '12 at 14:01
  • I would suggest replacing _So, here, the answer is clearly "yes"_ to instead read _In this case the answer is "yes" however there are multiple factors that can affect your results._ – mah Dec 18 '12 at 14:02
  • +1 Good answer - I didn't have time when I first answered to go into as much depth. I just added the code to do it in the loop, and then realized you'd done something similar :-) – Jonathon Reinhart Dec 19 '12 at 01:11
2

Why don't you just use the shift operator built into C++?

(string[0] << 16) + (string[1] << 15) + (string[2] << 14) + ...
Andreas Brinck
  • 51,293
  • 14
  • 84
  • 114
2

You can use template metaprogramming, which ensures that the power of 2 is calculated at compile time irrespective of the compiler:

template<unsigned int SHIFT>
struct PowerOf2
{
  static const size_t value = 1 << SHIFT;
};

For ease use macro as below:

#define CONSTRUCT(I) (string[I] * PowerOf2<16 - I>::value)

Now using,

CONSTRUCT(0)

is equivalent to:

string[0] * 65536
iammilind
  • 68,093
  • 33
  • 169
  • 336
1

You can accumulate it by continually multiplying by 2.

int doubleRunningTotalAndAdd(int runningTotal, unsigned char c)
{
    runningTotal *= 2;
    runningTotal += c;
    return runningTotal;
}

string s = "hello";

int total = accumulate(s.rbegin(), s.rend(), 0, doubleRunningTotalAndAdd);
Peter Wood
  • 23,859
  • 5
  • 60
  • 99
0

There's no rule; the compiler will generate code which will give the correct results. All of the compilers I know will use a combination of shifts and adds and subtracts when that is the fastest solution. I've worked on systems where integer multiplication was faster than a shift; I've also worked on a system where the compiler generated better code for h * 127 than for (h << 7) - h, this despite the fact that the machine didn't have hardware multiply.

If you want the numbers as the initializer of a const array, of course, the obvious answer is to generate them with some other program, and insert the generated text.

James Kanze
  • 150,581
  • 18
  • 184
  • 329