1

I made an AVL tree in C++, and the tree is kept balanced successfully according to all of my tests. The function addNode should work in O(log(n)) (when n is the number of nodes in the tree), and it seems like my implementation satisfies it. To verify, I wrote the following test:

#include "AVLTree.h"
#include "iostream"
#include <chrono>
#include <vector>
#include <random>
#include <algorithm>
#include <memory>

using namespace std;
using namespace std::chrono;

typedef high_resolution_clock Clock;
typedef Clock::time_point ClockTime;

auto ExecutionTime(ClockTime start_time, ClockTime end_time)
{
    return duration_cast<nanoseconds>(end_time - start_time).count();
}

#define N 10000000
#define M 100000

int main(){
    AVLTree<unsigned long long, unsigned long long> tree; // AVLTree<key type, value type>
    ClockTime start_time;
    ClockTime end_time;

    std::vector<unsigned long long> vector;
    for (unsigned long long i=0; i<N; i++) vector.push_back(i);

    auto max_time = ExecutionTime(start_time, start_time); // currently zero, will get bigger.

    unsigned seed = std::chrono::system_clock::now().time_since_epoch().count();
    shuffle (vector.begin(), vector.end(), std::default_random_engine(seed));

    unsigned long long counter = 0;
    for (auto i : vector){
        start_time = Clock::now();
        tree.addNode(i, i);
        end_time = Clock::now();
        max_time = max(max_time, ExecutionTime(start_time, end_time));
        counter++;
        if (counter == M){
            cout << "Small add: " << max_time << endl;
        }
    }
    cout << "Add: " << max_time << end;
}

Because the function works in θ(log(n)), for sufficiently large n, T(addNode)<clog(n) (T is the time function) for some constant c, and if we take a really tight one, we may assume T(addNode)=clog(n) for large enough n. That means that the above test should output:

Small add: x
Add: (<2x)

For some x, and most time it does, but I also got

Small add: 280100
Add: 14432000

Which is almost 100 times bigger. Assuming the implementation is correct, why did it happen (it happens pretty rarely)?

Yahav Boneh
  • 120
  • 2
  • 8
  • 1
    One possibility is, the thread your program runs on is pre-empted by the OS and the CPU assigned to do some other task (e.g. flushing hard drive caches, or processing an unexpected burst of Wi-Fi packets, or whatever it is operating systems do in their spare time). [Microbenchmarking](https://stackoverflow.com/questions/2842695/what-is-microbenchmarking) is hard. Rather than measuring each individual call, you may want to measure the average time over a large number of calls. – Igor Tandetnik Dec 19 '20 at 14:01

1 Answers1

0

There are many things going on in you code besides the actual addNode logic. These thigs are done 100 times more in the M case:

  • Copying the next vector element to the i variable (use const auto&)
  • Execution of Clock::now() which god know how many calls it does
  • Calling the ExecutionTime function which uses an obscure amount of time to execute

Now consider if the running time of your addNode is in O(log(n)) the effect of the above actions become more and more dominant as the size grows. To check this simply comment the addNode line and see the results.

kdcode
  • 524
  • 2
  • 7
  • But I did not measure the time of the whole loop execution, just the time of each ```addNode```. So the copy c'tor, the ```Clock::now()``` and the ```ExecutionTime``` affects are the same each time, and don't grow as ```n``` grows. – Yahav Boneh Dec 20 '20 at 19:24
  • Alright this is true, can you share the actual code? – kdcode Dec 21 '20 at 07:25
  • I don't think it is a good idea because these are hundreds of lines, but Igor's answer seems to explain it. Thanks – Yahav Boneh Dec 21 '20 at 21:30