2

Trying to make a hash table using smdb algorithm (since I hear it is wise to not attempt to write my own.) I'm sure I am doing it completely wrong. Did I mention I am new to C?

My hashFunction() % size is first returning a number like 35 the first time it is called, then on the 2nd call, 3rd call, 4th call..., it returns 65 ad infinitum. I am just using those numbers as arbitrary examples. After trying to figure it out with the debugger, I noticed the hashFunction is returning different longs, but they all end with the same last 2 numbers...like so...

4460735 4526335 4591935

So I guess this is why when I hash % size, I end up with the same output each time. Which goes against the idea of evenly distributed keys, right?

Go easy on me please. I know how savage people on SO can be.

#include <stdio.h>
#include <stdlib.h>

typedef struct node
{
    char* str;
    struct node* next;
} 
node;

void insertItem(char* number, node** list);
unsigned long hashFunction(char* str);

int main(void)
{

    int size = 100;
    int index = 0;

    node* buckets[size];

    for (int i = 0; i < size; i++)
    {
        char c = i + 'A';
        index = hashFunction(&c) % size;
        insertItem(&c, &buckets[index]);
    }

}

void insertItem(char* str, node** list)
{
    node* newItem = malloc(sizeof(node));
    newItem->str = str;
    newItem->next = *list;
    *list = newItem;
}

unsigned long hashFunction(char* str)
{
    //sdbm hash function adapted (incorrectly?) from here: http://www.cse.yorku.ca/~oz/hash.html
    unsigned long hash = 0;
    int c;

    while ((c = *str++))
        hash = c + (hash << 6) + (hash << 16) - hash;

    return hash;
}
JohnWick
  • 4,929
  • 9
  • 37
  • 74

2 Answers2

2

The problem is that you make your test on characters not on strings.

If you feed your algorithm with real strings, then you get something more significant. For example, change your code with the following:

char mystring[] = "Any string will do !";

for (int i = 0; i < size; i++)
{
    mystring[0] = i; // simple hack to change the string a bit, well ... a byte ;)
    index = hashFunction(mystring) % size;
    insertItem(mystring, &buckets[index]);
}

If you print index you will get a more proper index.

Edit:

The real problem is that your hash function is designed to get a C string as a parameter (a char* pointing to a buffer which must be null-terminated, ie ending with '\0'). As you give the address of a single char, the first dereference is ok, but the use of the next address (after ++) pointing to something that is not a real allocated object, is undefined behavior.

Credits: See moooeeeep answer, and comments.

Community
  • 1
  • 1
neuro
  • 14,948
  • 3
  • 36
  • 59
  • Hi, thanks for your response. It was actually only ending up in 2 buckets, and 1 of those buckets only had 1 list item, the rest were all in the 2nd bucket. I'll try again with longer strings and see how it goes, like you suggested. – JohnWick Apr 11 '17 at 17:37
  • when I print indexes with your code, I get one 65, then many 35 and then not so many 7. The problem is the type of your data which is char not char* – neuro Apr 11 '17 at 17:40
  • When I replaced char c = i + 'A' with what you suggested, (and remove the &'s on my function arguments that use c), I get this error: "adding 'int' to a string does not append to the string [-Werror,-Wstring-plus-int]" – JohnWick Apr 11 '17 at 17:41
  • 1
    I used -std=gnu99 to have simple string concatenation ... Use sprintf – neuro Apr 11 '17 at 17:42
  • ok thanks, I'll try sprintf. If it works I'll mark this as the answer. – JohnWick Apr 11 '17 at 17:49
  • @David: I updated the code with a simple test string. – neuro Apr 11 '17 at 17:50
  • Ok well, it seems to work much better with an actual string like you mentioned. Bad test case by me! Appreciate you helping me out here. – JohnWick Apr 11 '17 at 17:57
  • @david: You're welcome. do not hesitate to upvote the answer, I will reach 10k ;) – neuro Apr 11 '17 at 18:01
2

The hash function expects a pointer to a null terminated string as input argument. You pass a pointer to a single character instead. The function then iterates invalid memory until it reaches a random null byte.

char c = i + 'A';
index = hashFunction(&c) % size;

You need to pass a pointer to a string instead. For example:

char arr[] = "Hello World";
index = hashFunction(arr) % size;

Also consider to set your size to a prime number for added randomness. For further reading:

Community
  • 1
  • 1
moooeeeep
  • 31,622
  • 22
  • 98
  • 187
  • Very helpful :) I like the way you explained it and provided some additional help beyond the scope of my question, I'd mark it as the answer but neuro pointed out my oversight first :P – JohnWick Apr 11 '17 at 18:07
  • @moooeeeep: You're right about the type but I'm not sure about the iteration through random memory. My C is sort of rusted ... – neuro Apr 11 '17 at 18:21
  • @neuro When the pointer is incremented and dereferenced in the loop condition [the behavior is undefined](https://en.wikipedia.org/wiki/Undefined_behavior). Dereferencing a pointer that doesn't point to an object is not allowed, whatever value the memory there currently has. It was just a coincidence that no segmentation fault was triggered. – moooeeeep Apr 12 '17 at 07:22
  • @moep: yes, indeed. I've overlooked that part. Thanks for the comment. I will edit my answer to reflect that. – neuro Apr 12 '17 at 07:29