O(N) algorithm slower than O(N logN) algorithm

Question

In array of numbers, each number appears even number of times, and only one number appears odd number of times. We need to find that number (question was previously discussed on Stack Overflow).

Here is a solution that solves the question with 3 different methods — two methods are O(N) (hash_set and hash_map), while one is O(NlogN) (sorting). However, profiling for arbitrarily large input shows that sorting is faster, and gets more and more faster (in comparison) as input increases.

What is wrong with implementation or complexity analysis, why is O(NlogN) method faster?

#include <algorithm>
#include <chrono>
#include <cmath>
#include <iostream>
#include <functional>
#include <string>
#include <vector>
#include <unordered_set>
#include <unordered_map>

using std::cout;
using std::chrono::high_resolution_clock;
using std::chrono::milliseconds;
using std::endl;
using std::string;
using std::vector;
using std::unordered_map;
using std::unordered_set;

class ScopedTimer {
public:
    ScopedTimer(const string& name)
    : name_(name), start_time_(high_resolution_clock::now()) {}

    ~ScopedTimer() {
        cout << name_ << " took "
        << std::chrono::duration_cast<milliseconds>(
                                                    high_resolution_clock::now() - start_time_).count()
        << " milliseconds" << endl;
    }

private:
    const string name_;
    const high_resolution_clock::time_point start_time_;
};

int find_using_hash(const vector<int>& input_data) {
    unordered_set<int> numbers(input_data.size());
    for(const auto& value : input_data) {
        auto res = numbers.insert(value);
        if(!res.second) {
            numbers.erase(res.first);
        }
    }
    return numbers.size() == 1 ? *numbers.begin() : -1;
}

int find_using_hashmap(const vector<int>& input_data) {
    unordered_map<int,int> counter_map;
    for(const auto& value : input_data) {
        ++counter_map[value];
    }
    for(const auto& map_entry : counter_map) {
        if(map_entry.second % 2 == 1) {
            return map_entry.first;
        }
    }
    return -1;
}

int find_using_sort_and_count(const vector<int>& input_data) {
    vector<int> local_copy(input_data);
    std::sort(local_copy.begin(), local_copy.end());
    int prev_value = local_copy.front();
    int counter = 0;
    for(const auto& value : local_copy) {
        if(prev_value == value) {
            ++counter;
            continue;
        }

        if(counter % 2 == 1) {
            return prev_value;
        }

        prev_value = value;
        counter = 1;
    }
    return counter == 1 ? prev_value : -1;
}

void execute_and_time(const string& method_name, std::function<int()> method) {
    ScopedTimer timer(method_name);
    cout << method_name << " returns " << method() << endl;
}

int main()
{
    vector<int> input_size_vec({1<<18,1<<20,1<<22,1<<24,1<<28});

    for(const auto& input_size : input_size_vec) {
        // Prepare input data
        std::vector<int> input_data;
        const int magic_number = 123454321;
        for(int i=0;i<input_size;++i) {
            input_data.push_back(i);
            input_data.push_back(i);
        }
        input_data.push_back(magic_number);
        std::random_shuffle(input_data.begin(), input_data.end());
        cout << "For input_size " << input_size << ":" << endl;

        execute_and_time("hash-set:",std::bind(find_using_hash, input_data));
        execute_and_time("sort-and-count:",std::bind(find_using_sort_and_count, input_data));
        execute_and_time("hash-map:",std::bind(find_using_hashmap, input_data));

        cout << "--------------------------" << endl;
    }
    return 0;
}

Profiling results:

sh$ g++ -O3 -std=c++11 -o main *.cc
sh$ ./main 
For input_size 262144:
hash-set: returns 123454321
hash-set: took 107 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 37 milliseconds
hash-map: returns 123454321
hash-map: took 109 milliseconds
--------------------------
For input_size 1048576:
hash-set: returns 123454321
hash-set: took 641 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 173 milliseconds
hash-map: returns 123454321
hash-map: took 731 milliseconds
--------------------------
For input_size 4194304:
hash-set: returns 123454321
hash-set: took 3250 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 745 milliseconds
hash-map: returns 123454321
hash-map: took 3631 milliseconds
--------------------------
For input_size 16777216:
hash-set: returns 123454321
hash-set: took 14528 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 3238 milliseconds
hash-map: returns 123454321
hash-map: took 16483 milliseconds
--------------------------
For input_size 268435456:
hash-set: returns 123454321
hash-set: took 350305 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 60396 milliseconds
hash-map: returns 123454321
hash-map: took 427841 milliseconds
--------------------------

Addition

Fast solution with xor suggested by @Matt is of course out of contest — under 1 sec for worst case in example:

int find_using_xor(const vector<int>& input_data) {
    int output = 0;
    for(const int& value : input_data) {
        output = output^value;
    }
    return output;
}
For input_size 268435456:
xor: returns 123454321
xor: took 264 milliseconds

but the question still stands — why is hash so inefficient compared to sorting in practice despite theoretical algorithmic complexity advantage?

101010 - Of course, I meant that O(N) is slower, there was typo in title. — Ilya Kobelevskiy, Jan 13 '15 at 22:44
`O` complexity is a theoretical limit. Reality though might be a lot different especially comparing contiguous data structures with hash maps that use linked lists internally. — 101010, Jan 13 '15 at 22:47
(1) Did you compile with optimisation? (2) Waiting for cache misses hurts; the hash-based schemes are doing linearly many completely unpredicted cache misses while the sorting-based scheme does essentially none. Note also that the latency of a cache line fetch from the smallest level of cache big enough to hold your data increases substantially as that level gets further away from the core. — tmyklebu, Jan 13 '15 at 22:48
You should try plotting time vs. N for all algorithms and see whether the complexity makes sense. But there are other effects at play. The hashed data is more fragmented than the vector data. — juanchopanza, Jan 13 '15 at 22:48
Yes, but as size of input increases, O complexity should prevail the data structures, shouldn't it? — Ilya Kobelevskiy, Jan 13 '15 at 22:49
@IlyaKobelevskiy: Asymptotic complexity cannot make predictions about concrete situations. It can't even make asymptotic predictions unless your model of computation is reasonable asymptotically. — tmyklebu, Jan 13 '15 at 22:50
This doesn't answer the question, but just wanted to propose a simple solution in O(N) time and O(1) memory, based on your description of the problem: Bitwise xor all the numbers together. The result is the number with an odd number of occurrences (all numbers with even occurrences cancel themselves out.) — Matt, Jan 13 '15 at 22:51
Not necessarily. In reality, various players come into play (e.g., cache memory, data-structure implementation). — 101010, Jan 13 '15 at 22:52
Heap fragmentation taking a toll by the end, perhaps? Between `1 << 24` and `1 << 28` elements, cache effects should have become largely stable, and the implementation of the container is the same, yet there's a factor of 24 and 26 for set and map -- markedly more than the 16-fold increase in elements. Something's fishy. OP, your machine didn't start to swap there, did it? — Wintermute, Jan 13 '15 at 23:03
@Jarlax That is correct because insertions for std::map would be logN, but I do not use std::map - only std::unordered_map which is hash based as well. — Ilya Kobelevskiy, Jan 13 '15 at 23:31
Also see [Why is processing a sorted array faster than an unsorted array?](http://stackoverflow.com/q/11227809/608639). Its a counter intuitive result in practice because a sorted array is supposed to run in O(n^2) in some cases. — jww, Jan 13 '15 at 23:58
@IlyaKobelevskiy Yes, I know. I just mean that I've tested map vs unordered map with identical algorithm (build the table then iterate all elements in table to find one without pair) and they works predictably for these containers - O(1) beats O(logn). — Jarlax, Jan 14 '15 at 00:21
@jww If we remove shuffling of input array, all algorithms run faster. But relative time difference between them remains almost the same. — Jarlax, Jan 14 '15 at 00:24
@Yakk: reserve helps considerably for the hash map (see my updated answer). The hash set solution already passes a size to the constructor. Using reserve instead of that size makes things slightly worse. (All said with the caveat: as measured by the test code when run on the machine where I ran the test code. These observations do not necessarily affect any other implementation on any other machine — but they might be indicative of the behaviour you'd find elsewhere.) — Jonathan Leffler, Jan 14 '15 at 03:55

Kan Li · Accepted Answer · 2015-01-14T02:02:56.707

It really depends on hash_map/hash_set implementation. By replacing libstdc++'s unordered_{map,set} with Google's dense_hash_{map,set}, and it is significantly faster than the sort. The drawback for dense_hash_xxx is that they require there are two values for key that will never be used. See their document for details.

Another thing to remember is: hash_{map,set} usually does a lot of dynamic memory allocation/deallocation, so it is better to use a better alternative to libc's default malloc/free, e.g. Google's tcmalloc or Facebook's jemalloc.

hidden $ g++ -O3 -std=c++11 xx.cpp /usr/lib/libtcmalloc_minimal.so.4
hidden $ ./a.out 
For input_size 262144:
unordered-set: returns 123454321
unordered-set: took 35 milliseconds
dense-hash-set: returns 123454321
dense-hash-set: took 18 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 34 milliseconds
unordered-map: returns 123454321
unordered-map: took 36 milliseconds
dense-hash-map: returns 123454321
dense-hash-map: took 13 milliseconds
--------------------------
For input_size 1048576:
unordered-set: returns 123454321
unordered-set: took 251 milliseconds
dense-hash-set: returns 123454321
dense-hash-set: took 77 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 153 milliseconds
unordered-map: returns 123454321
unordered-map: took 220 milliseconds
dense-hash-map: returns 123454321
dense-hash-map: took 60 milliseconds
--------------------------
For input_size 4194304:
unordered-set: returns 123454321
unordered-set: took 1453 milliseconds
dense-hash-set: returns 123454321
dense-hash-set: took 357 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 596 milliseconds
unordered-map: returns 123454321
unordered-map: took 1461 milliseconds
dense-hash-map: returns 123454321
dense-hash-map: took 296 milliseconds
--------------------------
For input_size 16777216:
unordered-set: returns 123454321
unordered-set: took 6664 milliseconds
dense-hash-set: returns 123454321
dense-hash-set: took 1751 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 2513 milliseconds
unordered-map: returns 123454321
unordered-map: took 7299 milliseconds
dense-hash-map: returns 123454321
dense-hash-map: took 1364 milliseconds
--------------------------
tcmalloc: large alloc 1073741824 bytes == 0x5f392000 @ 
tcmalloc: large alloc 2147483648 bytes == 0x9f392000 @ 
tcmalloc: large alloc 4294967296 bytes == 0x11f392000 @ 
For input_size 268435456:
tcmalloc: large alloc 4586348544 bytes == 0x21fb92000 @ 
unordered-set: returns 123454321
unordered-set: took 136271 milliseconds
tcmalloc: large alloc 8589934592 bytes == 0x331974000 @ 
tcmalloc: large alloc 2147483648 bytes == 0x21fb92000 @ 
dense-hash-set: returns 123454321
dense-hash-set: took 34641 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 47606 milliseconds
tcmalloc: large alloc 2443452416 bytes == 0x21fb92000 @ 
unordered-map: returns 123454321
unordered-map: took 176066 milliseconds
tcmalloc: large alloc 4294967296 bytes == 0x331974000 @ 
dense-hash-map: returns 123454321
dense-hash-map: took 26460 milliseconds
--------------------------

Code:

#include <algorithm>
#include <chrono>
#include <cmath>
#include <iostream>
#include <functional>
#include <string>
#include <vector>
#include <unordered_set>
#include <unordered_map>

#include <google/dense_hash_map>
#include <google/dense_hash_set>

using std::cout;
using std::chrono::high_resolution_clock;
using std::chrono::milliseconds;
using std::endl;
using std::string;
using std::vector;
using std::unordered_map;
using std::unordered_set;
using google::dense_hash_map;
using google::dense_hash_set;

class ScopedTimer {
public:
    ScopedTimer(const string& name)
    : name_(name), start_time_(high_resolution_clock::now()) {}

    ~ScopedTimer() {
        cout << name_ << " took "
        << std::chrono::duration_cast<milliseconds>(
                                                    high_resolution_clock::now() - start_time_).count()
        << " milliseconds" << endl;
    }

private:
    const string name_;
    const high_resolution_clock::time_point start_time_;
};

int find_using_unordered_set(const vector<int>& input_data) {
    unordered_set<int> numbers(input_data.size());
    for(const auto& value : input_data) {
        auto res = numbers.insert(value);
        if(!res.second) {
            numbers.erase(res.first);
        }
    }
    return numbers.size() == 1 ? *numbers.begin() : -1;
}

int find_using_unordered_map(const vector<int>& input_data) {
    unordered_map<int,int> counter_map;
    for(const auto& value : input_data) {
        ++counter_map[value];
    }
    for(const auto& map_entry : counter_map) {
        if(map_entry.second % 2 == 1) {
            return map_entry.first;
        }
    }
    return -1;
}

int find_using_dense_hash_set(const vector<int>& input_data) {
    dense_hash_set<int> numbers(input_data.size());
    numbers.set_deleted_key(-1);
    numbers.set_empty_key(-2);
    for(const auto& value : input_data) {
        auto res = numbers.insert(value);
        if(!res.second) {
            numbers.erase(res.first);
        }
    }
    return numbers.size() == 1 ? *numbers.begin() : -1;
}

int find_using_dense_hash_map(const vector<int>& input_data) {
    dense_hash_map<int,int> counter_map;
    counter_map.set_deleted_key(-1);
    counter_map.set_empty_key(-2);
    for(const auto& value : input_data) {
        ++counter_map[value];
    }
    for(const auto& map_entry : counter_map) {
        if(map_entry.second % 2 == 1) {
            return map_entry.first;
        }
    }
    return -1;
}

int find_using_sort_and_count(const vector<int>& input_data) {
    vector<int> local_copy(input_data);
    std::sort(local_copy.begin(), local_copy.end());
    int prev_value = local_copy.front();
    int counter = 0;
    for(const auto& value : local_copy) {
        if(prev_value == value) {
            ++counter;
            continue;
        }

        if(counter % 2 == 1) {
            return prev_value;
        }

        prev_value = value;
        counter = 1;
    }
    return counter == 1 ? prev_value : -1;
}

void execute_and_time(const string& method_name, std::function<int()> method) {
    ScopedTimer timer(method_name);
    cout << method_name << " returns " << method() << endl;
}

int main()
{
    vector<int> input_size_vec({1<<18,1<<20,1<<22,1<<24,1<<28});

    for(const auto& input_size : input_size_vec) {
        // Prepare input data
        std::vector<int> input_data;
        const int magic_number = 123454321;
        for(int i=0;i<input_size;++i) {
            input_data.push_back(i);
            input_data.push_back(i);
        }
        input_data.push_back(magic_number);
        std::random_shuffle(input_data.begin(), input_data.end());
        cout << "For input_size " << input_size << ":" << endl;

        execute_and_time("unordered-set:",std::bind(find_using_unordered_set, std::cref(input_data)));
        execute_and_time("dense-hash-set:",std::bind(find_using_dense_hash_set, std::cref(input_data)));
        execute_and_time("sort-and-count:",std::bind(find_using_sort_and_count, std::cref(input_data)));
        execute_and_time("unordered-map:",std::bind(find_using_unordered_map, std::cref(input_data)));
        execute_and_time("dense-hash-map:",std::bind(find_using_dense_hash_map, std::cref(input_data)));

        cout << "--------------------------" << endl;
    }
    return 0;
}

@JonathanLeffler `tcmalloc` also plays a significant role in this benchmark. If I use libc's `malloc/free`, the `dense_hash_map` is roughly as fast as `sort`, instead of almost twice as faster. But it is still much faster than libstdc++'s implementation though. — Kan Li, Jan 14 '15 at 02:27
Hmm. I tried using tcmalloc when the question was posted and I got even worse timings with `unordered_{map,set}`. I suspect it's the combination of a memory allocator that's better at handling large allocations and a hash table that only does large allocations that's giving you your performance increase. Nice find, though. — tmyklebu, Jan 14 '15 at 16:20

score 6 · Answer 2 · edited May 23 '17 at 12:12

This analysis is substantially the same as that done by user3386199 in his answer. It is the analysis I would have performed regardless of his answer — but he did get there first.

I ran the program on my machine (HP Z420 running an Ubuntu 14.04 LTE derivative), and added output for 1<<26, so I have a different set of numbers, but the ratios look remarkably similar to the ratios from the data in the original post. The raw times I got were (file on-vs-logn.raw.data):

For input_size 262144:
hash-set: returns 123454321
hash-set: took 45 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 34 milliseconds
hash-map: returns 123454321
hash-map: took 61 milliseconds
--------------------------
For input_size 1048576:
hash-set: returns 123454321
hash-set: took 372 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 154 milliseconds
hash-map: returns 123454321
hash-map: took 390 milliseconds
--------------------------
For input_size 4194304:
hash-set: returns 123454321
hash-set: took 1921 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 680 milliseconds
hash-map: returns 123454321
hash-map: took 1834 milliseconds
--------------------------
For input_size 16777216:
hash-set: returns 123454321
hash-set: took 8356 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 2970 milliseconds
hash-map: returns 123454321
hash-map: took 9045 milliseconds
--------------------------
For input_size 67108864:
hash-set: returns 123454321
hash-set: took 37582 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 12842 milliseconds
hash-map: returns 123454321
hash-map: took 46480 milliseconds
--------------------------
For input_size 268435456:
hash-set: returns 123454321
hash-set: took 172329 milliseconds
sort-and-count: returns 123454321
sort-and-count: took 53856 milliseconds
hash-map: returns 123454321
hash-map: took 211191 milliseconds
--------------------------

real    11m32.852s
user    11m24.687s
sys     0m8.035s

I created a script, awk.analysis.sh, to analyze the data:

#!/bin/sh

awk '
BEGIN { printf("%9s  %8s  %8s  %8s  %8s  %8s  %8s  %9s  %9s  %9s  %9s\n",
               "Size", "Sort Cnt", "R:Sort-C", "Hash Set", "R:Hash-S", "Hash Map",
               "R:Hash-M", "O(N)", "O(NlogN)", "O(N^3/2)", "O(N^2)")
}
/input_size/           { if (old_size   == 0) old_size   = $3; size       = $3 }
/hash-set: took/       { if (o_hash_set == 0) o_hash_set = $3; t_hash_set = $3 }
/sort-and-count: took/ { if (o_sort_cnt == 0) o_sort_cnt = $3; t_sort_cnt = $3 }
/hash-map: took/       { if (o_hash_map == 0) o_hash_map = $3; t_hash_map = $3 }
/^----/ {
    o_n = size / old_size
    o_nlogn = (size * log(size)) / (old_size * log(old_size))
    o_n2    = (size * size) / (old_size * old_size)
    o_n32   = (size * sqrt(size)) / (old_size * sqrt(old_size))
    r_sort_cnt = t_sort_cnt / o_sort_cnt
    r_hash_map = t_hash_map / o_hash_map
    r_hash_set = t_hash_set / o_hash_set
    printf("%9d  %8d  %8.2f  %8d  %8.2f  %8d  %8.2f  %9.0f  %9.2f  %9.2f  %9.0f\n",
           size, t_sort_cnt, r_sort_cnt, t_hash_set, r_hash_set,
           t_hash_map, r_hash_map, o_n, o_nlogn, o_n32, o_n2)
}' < on-vs-logn.raw.data

The output from the program is quite wide, but gives:

     Size  Sort Cnt  R:Sort-C  Hash Set  R:Hash-S  Hash Map  R:Hash-M       O(N)   O(NlogN)   O(N^3/2)     O(N^2)
   262144        34      1.00        45      1.00        61      1.00          1       1.00       1.00          1
  1048576       154      4.53       372      8.27       390      6.39          4       4.44       8.00         16
  4194304       680     20.00      1921     42.69      1834     30.07         16      19.56      64.00        256
 16777216      2970     87.35      8356    185.69      9045    148.28         64      85.33     512.00       4096
 67108864     12842    377.71     37582    835.16     46480    761.97        256     369.78    4096.00      65536
268435456     53856   1584.00    172329   3829.53    211191   3462.15       1024    1592.89   32768.00    1048576

It is reasonably clear that on this platform, the hash set and hash map algorithms are not O(N), nor are they as good as O(N.logN), but they are better than O(N^3/2) let alone O(N²). On the other hand, the sorting algorithm is very close to O(N.logN) indeed.

You can only put that down to a theoretical deficiency in the hash set and hash map code, or an inadequate sizing of the hash tables so that they are using a sub-optimal hash table size. It would be worth investigating what mechanisms exist to pre-size the hash set and hash map to see whether using that affects the performance. (See also extra information below.)

And, just for the record, here's the output from the analysis script on the original data:

     Size  Sort Cnt  R:Sort-C  Hash Set  R:Hash-S  Hash Map  R:Hash-M       O(N)   O(NlogN)   O(N^3/2)     O(N^2)
   262144        37      1.00       107      1.00       109      1.00          1       1.00       1.00          1
  1048576       173      4.68       641      5.99       731      6.71          4       4.44       8.00         16
  4194304       745     20.14      3250     30.37      3631     33.31         16      19.56      64.00        256
 16777216      3238     87.51     14528    135.78     16483    151.22         64      85.33     512.00       4096
268435456     60396   1632.32    350305   3273.88    427841   3925.15       1024    1592.89   32768.00    1048576

Further testing shows that modifying the hash functions as shown:

int find_using_hash(const vector<int>& input_data) {
    unordered_set<int> numbers;
    numbers.reserve(input_data.size());

and:

int find_using_hashmap(const vector<int>& input_data) {
    unordered_map<int,int> counter_map;
    counter_map.reserve(input_data.size());

produces an analysis like this:

     Size  Sort Cnt  R:Sort-C  Hash Set  R:Hash-S  Hash Map  R:Hash-M       O(N)   O(NlogN)   O(N^3/2)     O(N^2)
   262144        34      1.00        42      1.00        80      1.00          1       1.00       1.00          1
  1048576       155      4.56       398      9.48       321      4.01          4       4.44       8.00         16
  4194304       685     20.15      1936     46.10      1177     14.71         16      19.56      64.00        256
 16777216      2996     88.12      8539    203.31      5985     74.81         64      85.33     512.00       4096
 67108864     12564    369.53     37612    895.52     28808    360.10        256     369.78    4096.00      65536
268435456     53291   1567.38    172808   4114.48    124593   1557.41       1024    1592.89   32768.00    1048576

Clearly, reserving the space for the hash map is beneficial.

The hash set code is rather different; it adds an item about half the time (overall), and 'adds' and then deletes an item the other half of the time. This is more work than the hash map code has to do, so it is slower. This also means that the reserved space is larger than really necessary, and may account for the degraded performance with the reserved space.

user3386109 · Answer 3 · 2015-01-14T00:45:51.493

2

Let's start by looking at the numbers for the sorting solution. In the table below, the first column is the size ratio. It's computed by calculating NlogN for a given test, and dividing by NlogN for the first test. The second column is the time ratio between a given test and the first test.

 NlogN size ratio      time ratio
   4*20/18 =  4.4     173 / 37 =  4.7
  16*22/18 = 19.6     745 / 37 = 20.1
  64*24/18 = 85.3    3238 / 37 = 87.5
1024*28/18 = 1590   60396 / 37 = 1630

You can see that there is very good agreement between the two ratios, indicating that the sort routine is indeed O(NlogN).

So why are the hash routines not performing as expected. Simple, the notion that extracting an item from a hash table is O(1) is pure fantasy. The actual extraction time depends on the quality of the hashing function, and the number of bins in the hash table. The actual extraction time ranges from O(1) to O(N), where the worst case occurs when all of the entries in the hash table end up in the same bin. So using a hash table, you should expect your performance to be somewhere between O(N) and O(N^2) which seems to fit your data, as shown below

 O(N)  O(NlogN)  O(N^2)  time
   4     4.4       16       6
  16      20      256      30
  64      85     4096     136
1024    1590     10^6    3274

Note that the time ratio is at the low end of the range, indicating that the hash function is working fairly well.

edited Jan 14 '15 at 00:45

answered Jan 14 '15 at 00:33

user3386109

34,287
7
49
68

I like the analysis for sort; did you do a parallel analysis for the hash algorithms to see whether O(NlogN) or O(N * N) matches the growth? Your answer gets a bit hand-wavy at that point. I agree that the hash algorithms are not exhibiting O(N) behaviour, but how much worse is it that O(N)? And yes, I know there are many possibilities, but showing 'worse than O(N) but better than O(NlogN)' or 'worse than O(N) but better than O(N * N)' would be moderately useful. It's a pity the testing skipped `1 << 26` as a size. – Jonathan Leffler Jan 14 '15 at 00:41
@JonathanLeffler Thanks, I was working on the table for the hash results. I'll add a column for O(NlogN) out of curiosity. – user3386109 Jan 14 '15 at 00:47
Hash table is not so bad :). In fact, map (red-black tree, guarantee O(logn) for insert, delete and search) is slower for all cases comparing to unordered_map (hash table). – Jarlax Jan 14 '15 at 00:55
That's interesting. It seems to show that this particular implementation of the hash-map and hash-set code provides O(NlogN) or worse performance, rather than the O(N) performance promised by the C++ standard. Whether that's true for all implementations is another matter. It explains why the discrepancy gets worse as the data size gets bigger. – Jonathan Leffler Jan 14 '15 at 00:58
5

"the notion that extracting an item from a hash table is O(1) is pure fantasy." The notion that random experiments (such as building a - sensibly implemented - hash table) are deterministic is pure fantasy. The notion that using properly designed hashing structures has expected O(1) access on the other hand is correct. In fact, the more elements there are, the less pronounced the variation due to randomness should be. I would take a look at the implementation of hashing used here. – G. Bach Jan 14 '15 at 01:54
@G.Bach: I agree with you, except I do not think that using "O(1)" to describe the running time of a random memory access in practise is fair to modern computers. As people design computers with larger and larger memories, the number of, say, integer additions that can be done per random memory access has been increasing too. – tmyklebu Jan 14 '15 at 16:25
@tmyklebu Well yes, machine specific things like memory size and access times are simply ignored by many machine models are supposed to be covered by the constants in Landau notation, and obviously those constants change over time. The overall rationale is a pretty sound abstraction of the complex machines that computers are though, I think - and keeping in mind that Landau notation is used as an abstraction of the fickle things that running times are is probably a pretty good idea, too. – G. Bach Jan 14 '15 at 17:06
@G.Bach: Landau notation is fine. But machine models that pretend random memory accesses, sequential memory accesses, and integer addition cost about the same aren't fine anymore if you care about things like the `log_2(n)`-ish times Quicksort looks at each array element versus the two or three dependent random memory accesses a hash table lookup takes. – tmyklebu Jan 14 '15 at 17:22

score 2 · Answer 4 · answered Jan 14 '15 at 03:20

I ran the program through valgrind with different input sizes, and I got these results for cycle counts:

with 1<<16 values:
  find_using_hash: 27 560 872
  find_using_sort: 17 089 994
  sort/hash: 62.0%

with 1<<17 values:
  find_using_hash: 55 105 370
  find_using_sort: 35 325 606
  sort/hash: 64.1%

with 1<<18 values:
  find_using_hash: 110 235 327
  find_using_sort:  75 695 062
  sort/hash: 68.6%

with 1<<19 values:
  find_using_hash: 220 248 209
  find_using_sort: 157 934 801
  sort/hash: 71.7%

with 1<<20 values:
  find_using_hash: 440 551 113
  find_using_sort: 326 027 778
  sort/hash: 74.0%

with 1<<21 values:
  find_using_hash: 881 086 601
  find_using_sort: 680 868 836
  sort/hash: 77.2%

with 1<<22 values:
  find_using_hash: 1 762 482 400
  find_using_sort: 1 420 801 591
  sort/hash: 80.6%

with 1<<23 values:
  find_using_hash: 3 525 860 455
  find_using_sort: 2 956 962 786
  sort/hash: 83.8%

This indicates that the sort time is slowly overtaking the hash time, at least theoretically. With my particular compiler/library (gcc 4.8.2/libsddc++), and optimization (-O2), the sort and hash methods would be the same speed at around 2^28 values, which is at the limit of what you are trying. I suspect that other system factors are coming into play when using that much memory, which is making it difficult to evaluate in actual wall time.

score 2 · Answer 5 · answered Jan 14 '15 at 09:57

The fact that O(N) was seemingly slower than O(N logN) was driving me crazy, so I decided to dive deep into the problem.

I did this analysis in Windows with Visual Studio, but I bet the results would be very similar on Linux with g++.

First of all, I used Very Sleepy to find the pieces of code that where being executed the most during the for loop in find_using_hash(). This is what I saw:

enter image description here

As you can see, the top entries are all related to lists (RtlAllocateHeap is called from lists code). Apparently, the problem is that for each insertion in the unordered_set and since buckets are implemented as lists, an allocation for a node is made and this sky-rockets the duration of the algorithm, as opposed to the sort which makes no allocations.

To be sure this was the problem, I wrote a VERY simple implementation of a hash table without allocations, and the results were far more reasonable:

enter image description here

So there it is, the factor log N multiplying N which in your largest example (i.e. 1<<28) is 28, is still smaller than the "constant" amount of work required for an allocation.

score 0 · Answer 6 · answered Jan 15 '15 at 02:24

There are many great answers here already, but this is the special kind of question which naturally generates many valid answers.

And I'm writing to provide an answer from a mathematical perspective (which is hard to do without LaTeX), because it is important to correct the unaddressed misconception that solving the given problem with hashes represents a problem that is "theoretically" O(n), yet somehow "practically" worse than O(n). Such a thing would be a mathematical impossibility!

For those wishing to pursue the topic in more depth, I recommend this book which I saved for and bought as a very poor high school student, and which stoked my interest in applied mathematics for many years to come, essentially changing the outcome of my life: http://www.amazon.com/Analysis-Algorithms-Monographs-Computer-Science/dp/0387976876

To understand why the problem is not "theoretically" O(n), it is necessary to note that the underlying assumption is also false: it is not true that hashes are "theoretically" an O(1) data structure.

The opposite is actually true. Hashes, in their pure form, are only "practically" an O(1) data structure, but theoretically still are an O(n) data structure. (Note: In hybrid form, they can achieve theoretical O(log n) performance.)

Therefore, the solution is still, in the best case, an O(n log n) problem, as n approaches infinity.

You may start to respond, but everyone knows that hashes are O(1)!

So now let me explain how that claim is true, but in the practical, not theoretical, sense.

For any application (regardless of n, so long as n is known ahead of time—what they call "fixed" rather than "arbitrary" in mathematical proofs), you can design your hash table to match the application, and obtain O(1) performance within the constraints of that environment. Each pure hashing structure is intended to perform well within an a priori range of data set sizes and with the assumed independence of keys with respect to the hashing function.

But when you let n approach infinity, as required by the definition of Big-O notation, then the buckets begin to fill (which must happen by the pigeonhole principle), and any pure hash structure breaks down into an O(n) algorithm (the Big-O notation here ignores the constant factor that depends on how many buckets there are).

Whoa! There's a lot in that sentence.

And so at this point, rather than equations, an appropriate analogy would be more helpful:

A very accurate mathematical understanding of hash tables is gained by imagining a filing cabinet containing 26 drawers, one for each letter of the alphabet. Each file is stored within the drawer that corresponds to the first letter in the file's name.

The "hash function" is an O(1) operation, looking at the first letter.
Storage is an O(1) operation: placing the file inside the drawer for that letter.
And as long as there are not more than one file inside each drawer, retrieval is an O(1) operation: opening the drawer for that letter.

Within these design constraints, this hash structure is O(1).

Now suppose that you exceed the design constraints for this "filing cabinet" hashing structure, and have stored several hundred files. Storage now takes as many operations as needed to find an empty space in each drawer, and retrieval takes as many operations as the number of items within each drawer.

Compared to throwing all the files into a single huge pile, the average performance overall is approximately better by a factor of 1/26th as much time. But remember, mathematically, one cannot say O(n/26), because O(n) notation by definition does not take into consideration constant factors which affect performance, but only algorithmic complexity as a function of n. So when the design constraints are exceeded, the data structure is O(n).

I have to disagree with your statement that hash solution is O(N log N). Your answer relies on assumption that size of hash is chosen once. However, if we allow for dynamic rehashing, hash solution is O(N) as N approaches infinity. Let us start with hash of size 1, rehash doubling number of buckets once capacity is exceeded. Then insertion of N elements is proprtional to 1+2(rehash)+1+4(rehash)+1+1+1+8(rehash) ~ N +Sum (1+2+2^2+..2^logN) ~ N. Here we assume rehashign is linear in size of hash, which is generally true. — Ilya Kobelevskiy, Jan 15 '15 at 15:24
@IlyaKobelevskiy You're assuming the best case of a perfect hash function, for which the expansion would result in `2n-1`, which is `O(n)` as you say. However, `O(n)` notation does not represent the best case, but the guaranteed performance for arbitrary data. Big-`O` notation must always take into consideration the worst case. The best case (dynamic hashing, hybrid data structure, etc.) of the worst case is `n log n`, and therefore the hash solution is `O(n log n)`, at best. The best hash structures aren't better than `O(log n)` in the worst case. — Joseph Myers, Jan 15 '15 at 17:55
Why worst case for inserting N elements into dynamically re-sizing hash (where re-size takes O(container size), and insertion takes O(1) on average, and O(container size) if re-size happens) is O(N log N)??? Isn't it always O(N) for arbitrary input data (as it follows from my first comment)? If not, what would be the input which would lead to O(N log N) complexity for the hash with specified insertion and re-size properties? — Ilya Kobelevskiy, Jan 15 '15 at 18:49
It simply isn't true that the worst case time (for any hash) is `O(1)`. Your task involves inserting `n` elements and looking up many or all of them. If the complexity of this overall task is truly `O(n)`, then it implies that the amortized time complexity of the hash structure is `O(1)`. This might be possible under strong assumptions of randomness and independence, but in a deterministic setting, like the "real world," such a statement is definitely false. In fact, your own question provides plenty of evidence supporting that such a statement is false!!! @IlyaKobelevskiy — Joseph Myers, Jan 15 '15 at 20:27

O(N) algorithm slower than O(N logN) algorithm

6 Answers6