Insertion with implementing 'Linear Hashing'

Question

i have to implement the data structure 'Linear Hashing' for my algorithms course, but i ran into some problems. I have implemented the add-function straight forward. I have to use the following hashfunction:

h(x) = x mod 2^d

My testvalues are:

24200, 16448, 16432, 4044, 24546, 10556, 29009, 25270, 32579, 13047

When i add this values to my data structure, i get the following:

LinearHashing [ TableSize=4 BucketSize=2 d=2 nextToSplit=0 ]
Elemente
values=
Bucket[0]
 [24200]  [16448]
Overflow Bucket:
 [16432]  [4044]
 [24546]  [10556]
 [25270]  [0]
Bucket[1]
 [29009]  [0]
Bucket[2]
 [0]  [0]
Bucket[3]
 [32579]  [13047]

But the value "25270" ends up in the wrong bucket, I have rechecked it a thousand times and i cant find my mistake. The bug is fixed when i set the Overflowbucketsize to "N/2" but this might be just a coincidence. The example should look like this(if i got the algorithm right):

LinearHashing [ TableSize=4 BucketSize=2 d=2 nextToSplit=0 ]
Elemente
values=
Bucket[0]
 [24200]  [16448]
Overflow Bucket:
 [16432]  [4044]
 [24546]  [10556]
 [0]  [0]
Bucket[1]
 [29009]  [0]
Bucket[2]
 [25270]  [0]
Bucket[3]
 [32579]  [13047]

I have to test my data structure with this testprogramm:

http://pastebin.com/k1p4Z6e3

and my data structure has to based on this *.h file which was provided by my professor:

http://pastebin.com/79jLERuF

and this is my complete code:

http://pastebin.com/mW09UEzx

The most important functions are the following. My add-function:

template <typename E, size_t N>
void LinearHashing<E,N>::add(const E e[], size_t len) {
    for(size_t i=0; i < len; ++i){
        size_t address = compIndex(e[i]);
        for(size_t j=0; j < N; ++j){
            if(table[address].bucketstatus[j] == leer){
                table[address].bucketElem[j] = e[i];
                table[address].bucketstatus[j] = besetzt;
                break;
            }else if(j == (N-1) && table[address].newoverflow == nullptr){
                table[address].newoverflow = new OverflowContainer();
                addToOverflow(e[i],table[address].newoverflow);
                split();
                reHash(nts);
                if(nts+1 == (pow(2, d))){
                    nts=0;
                    ++d;
                }else{
                    ++nts;
                }
            }else if(j == (N-1) && table[address].newoverflow != nullptr){
                addToOverflow(e[i],table[address].newoverflow);
            }
        }
    }
}

The index of the element is calculated with the following two functions:

hashValue:

template <typename E> inline size_t hashValue(const E& e) { return size_t(e); }

compIndex:

template <typename E, size_t N>
inline size_t LinearHashing<E, N>::compIndex(E e){
    size_t adrOP = powl(2,d);
    return hashValue(e) % adrOP;
}

Overflows are added by the following function:

template <typename E, size_t N>
void LinearHashing<E, N>::addToOverflow(E e, LinearHashing::OverflowContainer *container) {
    size_t i=0;
    while( i < N/2){
        if(container->overflowstatus[i] == leer){
            container->overflowbucket[i] = e;
            container->overflowstatus[i] = besetzt;
            break;
        }else if(i == (N/2)-1 && container->nextOverflow == nullptr){
            container->nextOverflow = new OverflowContainer();
            container = container->nextOverflow;
            split();
            reHash(nts);
            if(nts+1 == (pow(2, d))){
                nts=0;
                ++d;
            }else{
                ++nts;
            }
            i=0;
        }else if(i == (N/2)-1 && container->nextOverflow != nullptr){
            container = container->nextOverflow;
            i=0;
        }else{
            ++i;
        }
    }
}

To perform a split i use this function:

template <typename E, size_t N>
void LinearHashing<E,N>::split(){
    Bucket* tmp = new Bucket[size()+1];
    std::copy(table,table+size(),tmp);
    delete[] table;
    table = tmp;
    ++n;
}

For now i perform a split when a new Overflow is created, to rehash the nextToSplit line i use the following rehash-functions:

template <typename E, size_t N>
void LinearHashing<E, N>::reHash(size_t index){
    bool reHashed = false;
    for(size_t i = 0; i < N; ++i){
        if(compIndex(table[index].bucketElem[i]) != compReIndex(table[index].bucketElem[i]) && table[index].bucketstatus[i] == besetzt){
            for (size_t j = 0; j < N && !reHashed ; ++j) {
                if(table[n-1].bucketstatus[j] == leer){
                    table[n-1].bucketElem[j] = table[index].bucketElem[i];
                    table[n-1].bucketstatus[j] = besetzt;
                    table[index].bucketstatus[i] = leer;
                    table[index].bucketElem[i] = 0;
                    reHashed = true;
                }
            }
            reHashed = false;
        }
    }
    if(table[index].newoverflow != nullptr){
        reHashOverflow(table[index].newoverflow);
    }
}
template <typename E, size_t N>
void LinearHashing<E, N>::reHashOverflow(OverflowContainer* container) {
    size_t i=0;
    bool reHashed = false;
    while(i < (N/2)){
        if(compIndex(container->overflowbucket[i]) != compReIndex(container->overflowbucket[i]) && container->overflowstatus[i] == besetzt){
            for(size_t j=0; j < N && !reHashed; ++j){
                if(table[n-1].bucketstatus[j] == leer){
                    table[n-1].bucketElem[j] = container->overflowbucket[i];
                    table[n-1].bucketstatus[j] = besetzt;
                    container->overflowbucket[i] = 0;
                    container->overflowstatus[i] = leer;
                    reHashed = true;
                }else if(j == N-1 && table[n-1].newoverflow == nullptr){
                    table[n-1].newoverflow = new OverflowContainer();
                    addToOverflow(container->overflowbucket[i],table[n-1].newoverflow);
                    container->overflowbucket[i] = 0;
                    container->overflowstatus[i] = leer;
                    reHashed = true;
                }else if(j == N-1 && table[n-1].newoverflow != nullptr){
                    addToOverflow(container->overflowbucket[i], table[n-1].newoverflow);
                    container->overflowbucket[i] = 0;
                    container->overflowstatus[i] = leer;
                    reHashed = true;
                }
            }
            reHashed = false;
            ++i;
        }else{
            ++i;
        }
        if(container->nextOverflow){
            container = container->nextOverflow;
            i = 0;
        }
    }
}

Please provide a [MCVE](http://www.stackoverflow.com/help/mcve) — Barry, May 18 '15 at 14:44
I added an example above, both the output i get and the one i should get(if i havent implemented it completely wrong) — Aragok, May 18 '15 at 14:46
@Cracksoldier `I have rechecked it a thousand times and i cant find my mistake` Use the debugger. You can't write all of this code and expect to get away with never using the debugger to debug issues. — PaulMcKenzie, May 18 '15 at 14:53
Replace `pow(2,d)` with `1 << d`. You don't want any floating point calculations. — molbdnilo, May 18 '15 at 14:54
I used a debugger (gdb) and the does what i want but i can't figure out where it went wrong. @molbdnilo i tried it but it didn't fix it. — Aragok, May 18 '15 at 14:55
@Cracksoldier Did you write the code? If so, single stepped into the debugger, watched variables, etc. where in the code that you wrote does the program go against your design? — PaulMcKenzie, May 18 '15 at 14:59
I wrote the add, rehash, rehashoverflow, addtooverflow, compIndex, compReIndex and the split function. the hashValue function was delivered by my professor and i have to use it. I have already checked the code with gdb and the last item i add from my example gets the wrong index and i can't figure out why because it gets calculated the same way the index of the other elements gets calculated. — Aragok, May 18 '15 at 15:06
On suspicious thing i found with the was the following in the add-function i always the 'optimized out' from the debugger, but the insertion works. — Aragok, May 18 '15 at 15:12
@Cracksoldier Well, the wrong result didn't occur by magic. Single step through the function, and see exactly what line, what variable, what set of variables, what loop is (or is not executed), etc, is not expected. Using the debugger is exactly how any of us helping you would solve the problem -- no one actually runs the computer in their heads, or can just merely eyeball mountains of code and determine what's wrong. — PaulMcKenzie, May 18 '15 at 15:13
@Cracksoldier: " optimized out" means that you're working with a release build. Debuggers _usually_ work on debug builds. These are somewhat slower (but that doesn't matter if you're single-stepping anyway) but their behavior is easier to follow. — MSalters, May 18 '15 at 15:19
@Cracksoldier Also, did you read the comment earlier concerning using `pow` to compute integer powers? It is not safe, as it is not guaranteed to give you an exact result. See here: http://stackoverflow.com/questions/25678481/why-pown-2-return-24-when-n-5/25678721#25678721 — PaulMcKenzie, May 18 '15 at 15:21
I have replaced pow with '1 << d' and im going throught code with gdb right now. @MSalters but it shouldn't affect the end result? — Aragok, May 18 '15 at 15:21
@Cracksoldier: It shouldn't affect **correct** programs. For incorrect programs, the difference can be huge. E.g. an out-of-bounds array access often behaves differently. — MSalters, May 18 '15 at 15:30

score 0 · Accepted Answer · answered May 25 '15 at 15:58

0

I have now reimplemted it completely and found a lot of errors e.g. implicit recursion of the split and rehash functions.

answered May 25 '15 at 15:58

Aragok

303
7
16

Insertion with implementing 'Linear Hashing'

1 Answers1