2

I'm writing some C++, trying to implement a hash table using arrays.

The hashing function is taken from this Stack Overflow post.

Since the hash table is dynamically resized in runtime, the following simple strategy is implemented:

  • Insert an element by % table's total size.
  • When # of elements inserted (= current size) >= table's total size,
  • create a new table with double the size, and re-insert all the elements.

The table's buckets each hold a value, and a boolean indicating whether said element should be inside the table.

Note that the table is being made only for non-negative integer values.

I can't get past the table construction: Memory usage gradually increases, as well as CPU's.

This should mean that there's a problem related with the condition checking on when to grow the table's size.

Table's constructor and destructor:

// I'm defining these here because they depend on the integer's bit size.
#define INVERSE 0x119de1f3
#define HASHER  0x45d9f3b


HashTable::HashTable(size_t initial_size, uint32_t *initial_data)
{
    total_size = 4U * initial_size;
    current_size = initial_size;

    data = new table[total_size];

    build(initial_data);
}

HashTable::~HashTable()
{
    delete[] data;

    total_size = 0U;
    current_size = 0U;
}

To build, we hash and put (insert) each initial element into the table:

void HashTable::build(uint32_t *initial_data)
{
    for (size_t i = 0U; i < current_size; i += 1U)
    {
        insert(initial_data[i]);
    }
}

To insert, we also use a hash function:

void HashTable::insert(uint32_t num)
{
    uint32_t index = hash(num);

    try
    {
        data[index].num = num;
        data[index].exists = true;

        current_size += 1U;
    }
    catch (out_of_range& error)
    {
        ;   // TODO stuff after writing main.
    }
}
uint32_t HashTable::hash(uint32_t num)
{
    if (current_size >= total_size / 2U) {
        doubleTable();

        for (size_t i = 0U; i < current_size; i += 1U)
        {
            insert(data[i].num);
        }
    }

    num = ((num >> 16) ^ num) * HASHER;
    num = ((num >> 16) ^ num) * HASHER;
    num =  (num >> 16) ^ num;

    return (num % total_size);
}

Lastly, here is a dummy main function:

int main(void)
{
    uint32_t initial_data[3] = {1, 3, 5};

    HashTable h(3U, initial_data);
    return 0;
}

This is how the table is structured:

struct table
{
    uint32_t num;
    bool exists = false;
};

And the relevant definitions:

class HashTable
{
    private:
        table *data;

        size_t total_size;
        size_t current_size;

        uint32_t hash(uint32_t num);
        uint32_t unhash(uint32_t num);

        void build(uint32_t *initial_data);

        void doubleTable(void);

    public:
        HashTable(size_t initial_size, uint32_t *initial_data);
        ~HashTable();

        void insert(uint32_t num);

After compiling with -ggdb3 and running valgrind, I get a pestering uninitialized value of size 8 error.

Here are some valgrind logs:

--1632-- Valgrind options:
--1632--    --leak-check=full
--1632--    --show-leak-kinds=all
--1632--    --track-origins=yes
--1632--    --verbose
--1632--    --log-file=valgrind-out.txt
==1632== Use of uninitialised value of size 8
==1632==    at 0x1093E8: HashTable::insert(unsigned int) (in /home/xlxs4/Git/CCDT/src/structures/hashtable/a.out)
==1632==    by 0x1095DC: HashTable::hash(unsigned int) (in /home/xlxs4/Git/CCDT/src/structures/hashtable/a.out)
==1632==    by 0x1093BD: HashTable::insert(unsigned int) (in /home/xlxs4/Git/CCDT/src/structures/hashtable/a.out)
==1632==    by 0x1096B0: HashTable::build(unsigned int*) (in /home/xlxs4/Git/CCDT/src/structures/hashtable/a.out)
==1632==    by 0x10933C: HashTable::HashTable(unsigned long, unsigned int*) (in /home/xlxs4/Git/CCDT/src/structures/hashtable/a.out)
==1632==    by 0x10920A: main (main.cpp:15)
==1632==  Uninitialised value was created by a heap allocation
==1632==    at 0x483950F: operator new[](unsigned long) (vg_replace_malloc.c:423)
==1632==    by 0x1092F7: HashTable::HashTable(unsigned long, unsigned int*) (in /home/xlxs4/Git/CCDT/src/structures/hashtable/a.out)
==1632==    by 0x10920A: main (main.cpp:15)

I've already gone through this post on eliminating undefined values and this SO post about uninitialized value of size 8.

What am I missing?

Let me know if you want any further classifications, or want me to further explain my thought process.

Thanks in advance :)

yacc
  • 2,915
  • 4
  • 19
  • 33
xlxs4
  • 53
  • 2
  • 6
  • Does `build` allocate? – Fureeish Jun 04 '19 at 15:12
  • 6
    Beware that your type doesn't seem to respect the [rule of 3/5/0](https://en.cppreference.com/w/cpp/language/rule_of_three) and is likely to have undefined behavior when you try to use it in a more complicated context. – François Andrieux Jun 04 '19 at 15:13
  • 1
    You need to familiarize yourself with the [rule of three](https://stackoverflow.com/questions/4172722/what-is-the-rule-of-three?noredirect=1&lq=1)/[five/zero](https://stackoverflow.com/questions/44997955/rule-of-zero-confusion) – Max Langhof Jun 04 '19 at 15:14
  • Sorry, I left that out. Some times, it does, indirectly, thorugh the calling of `hash`. I'll add the code in the post. It essentially goes through each element in the array given as input and hashes it, `insert`ing it in the table. – xlxs4 Jun 04 '19 at 15:15
  • 2
    `table::num` is left uninitialized, so when you make the table bigger, the copy of these values is from an uninitialized value. –  Jun 04 '19 at 15:16
  • @FrançoisAndrieux, I was unaware of this concept! Thanks for letting me know. – xlxs4 Jun 04 '19 at 15:23
  • The lack of rule of 0/3/5 in OP's example code is unrelated to their question, if this is indeed a dupe, it's of a different answer. –  Jun 04 '19 at 19:03
  • This shouldn't be duplicated as @Frank said above. Nonetheless, I figured it out. I re-wrote the code according to the rule of three principles and I utilized initialization lists. What caused the problem I described in the post, is that `i < current_size` in `build` will never terminate, as `current_size` gets incremented each time `insert` is being called. I don't have sufficient reputation to answer my own question, as it seems. Thanks to all who helped! – xlxs4 Jun 04 '19 at 22:52

1 Answers1

1

xlxs4:

This shouldn't be duplicated as @Frank said above. Nonetheless, I figured it out. I re-wrote the code according to the rule of three principles and I utilized initialization lists. What caused the problem I described in the post, is that i < current_size in build will never terminate, as current_size gets incremented each time insert is being called. I don't have sufficient reputation to answer my own question, as it seems. Thanks to all who helped!

yacc
  • 2,915
  • 4
  • 19
  • 33