-3

I'm learning C right now and I had originally built this hash function for a spell checker program I'm building in a CS50 edx course.

int hashing(char *word)      
{

    unsigned int hash = 0;
    for (int i = 0, n = strlen(word); i < n; i++)
        hash += word[i];
    return hash % HTABLE_SIZE;
}

Then I stumbled upon this hash function on reddit that uses bit-shift operators.

int hashing(char *word)
{
    unsigned int hash = 0;
    for (int i = 0, n = strlen(word); i < n; i++)
        hash = (hash << 2) ^ word[i];
    return hash % HTABLE_SIZE;
} 

With this hash function the speed of my program went from 0.13 seconds to 0.06 seconds. Can someone please explain to me why this hash function is so much faster?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
  • How do you compile your program? Fully optimized? –  Nov 29 '17 at 07:07
  • https://stackoverflow.com/a/6357747/6935629 – msc Nov 29 '17 at 07:08
  • 1
    Which compiler are you using? Also, use the online [Compiler Explorer](https://godbolt.org) to see what assembly code is generated from your source. – Greg Hewgill Nov 29 '17 at 07:08
  • 2
    Considering that this is a hash function, it could easily be because of the resulting hash distribution. Neither of these are very good hashes, but the weaknesses of the first seem more likely to come up in a spell checker - for example, reordering the letters doesn't affect the hash in the first one. – user2357112 Nov 29 '17 at 07:11
  • 2
    Of course, trying to extrapolate from these tiny code snippets to the behavior of an entire unknown program compiled with unknown settings on an unknown compiler and architecture and run on unknown input is futile. This question isn't answerable. – user2357112 Nov 29 '17 at 07:12
  • There are many, many different hash functions you can use tailored to what you are hashing. Many (if not most) will use some type of shift or negation to create the hash. (just search for different hash functions). If you are using a recent compiler with optimizations enabled in your compile string, then your compiler should be handling the optimizations to the point there should be little difference between whether you multiply by a power of two of left shift (the compiler will make those type of optimizations). Concentrate on reducing collisions, let the compiler optimize the math. – David C. Rankin Nov 29 '17 at 07:16
  • You need at least to provide a [mcve] – Jabberwocky Nov 29 '17 at 08:13
  • 2
    They don't. Both your functions are horrific if you are concerned with efficiency. Replace the `strlen` function with a simple `a[i]` to know if it is a zero byte. So your assumption from the start, that one is more efficient than the other, is just not valid. As @DavidC.Rankin said, first learn about hash functions and there different advantages and drawbacks. Then, if (and only if) there is a performance problem that is measurable *and* relevant, try to improve the performance. – Jens Gustedt Nov 29 '17 at 09:05
  • 1
    I'm with @user2357112. Most likely, the hash function is not significantly faster (if it is faster at all) but __the resulting hash value is different__, so the hash table might be used better (however, you didn't include any information of what you do with the computed hash) – grek40 Nov 29 '17 at 09:29

1 Answers1

1

I don't think shift + xor is faster than addition.

However, the resulting hash table is probably much faster, because the hash values are much better distributed.

Thomas Mueller
  • 48,905
  • 14
  • 116
  • 132