Why unordered_set is faster than vector in this situation?

Question

Today i try to solve a algorithm problem from leetcode, and my answer is over here：

int longestConsecutive(vector<int>& nums) {
    unordered_set<int> hashSet(nums.begin(), nums.end());
    int ans = 0;
    for(auto num : nums){ // nums change to hashSet
        if(!hashSet.count(num-1)){
            int length = 1;
            while(hashSet.count(++num)){
                ++length;
            }
            ans = max(ans, length);
        }
    }
    return ans;
}

run and use 472ms to pass the test case. But when i change for(auto num : nums) to for(auto num : hashSet), the time reduced to 106ms.

What a huge improvement! and can anyone please tell me why?

Because by the time the code went back to the vector, its memory has already been evicted from the CPU cache, and replaced by the memory used to hold the newly-constructed hash, so it must be refetched from slow RAM, again, while the hash's memory is still cached, and its much faster to access it? — Sam Varshavchik, Aug 29 '22 at 03:14
*What a huge improvement! and can anyone please tell me why?* -- Run the tests on a local compiler, with optimizations turned on. I have no idea how "Leetcode" determines how fast code takes. Or go to a site that specialized in benchmarking code, such as [quickbench](https://quick-bench.com/). — PaulMcKenzie, Aug 29 '22 at 03:21
Not saying this is the case but Leetcode is extremely innacurate for C++. I think they count time to build the executable as well as to start the process and many times these startup costs dwarf the actual algorithm time. — Something Something, Aug 29 '22 at 03:26
The input clearly allows duplicates, and so a conceivable test would contain 100000 of the same value. When you iterate through the vector, you do the same work over and over again before eventually arriving at the answer "1". Whereas when you iterate through the set, it contains only a single value. — paddy, Aug 29 '22 at 03:54
@hmy [Try this version of the code](https://godbolt.org/z/oTq5qbhK7). What does Leetcode say about that version? — PaulMcKenzie, Aug 29 '22 at 04:19
What might really bake your noodle is the differential in using `std::set` instead of `std::unordered_set`. It may surprise you. — WhozCraig, Aug 29 '22 at 04:29
@PaulMcKenzie, WhozCraig: but that breaks the O(n) constraint (even if it might be faster). — Jarod42, Aug 29 '22 at 06:02
@WhozCraig, without changing the way how the 'algo' is implemented, using std::set won't make much difference — SPD, Aug 29 '22 at 08:24
@paddy yes, you are right, i change `unordered_set` to `unordered_multiset`, the time increase to _456ms_, be similar to vector. — hmy, Aug 29 '22 at 08:50
Your code does not appear to be O(N) since there are two loops one inside the other. — Something Something, Aug 29 '22 at 09:29
@WhozCraig, why are you benchmarking a special case of 1 element (v{100000} and v10{100000})? — SPD, Aug 29 '22 at 10:03
@MadFred it _is_ O(N): the second loop is only executed when the current value is the first in a sequence, and it only iterates over values that belong to that sequence. — paddy, Aug 29 '22 at 10:40
@SPD Because I'm a space cadet and really shouldn't be on SO after a marathon. Derp. [This is a lot more reasonable](https://quick-bench.com/q/Ccq2EJZUGhdByjJOkon0nXDeLE0). Derp. Thanks for the sanity check. — WhozCraig, Aug 29 '22 at 16:22

Something Something · Answer 1 · 2022-08-29T18:05:53.887

First, I do not think that your algorithm is O(N) at all. An O(N) algorithm would typically loop through every element in the vector. In your case you have two nested loops and the inside loop does not seem to be constant time.

But regardless I did two versions, one for sorted and another for unsorted series (hash maps).

template< typename T > 
int findLongestSorted( T& sorted ) {
    int counter = 1;
    int maxcount = 1;
    int last = sorted[0];
    for ( int j=1; j<sorted.size(); ++j ) {            
        int num = sorted[j];
        if ( num == last ) continue;
        if ( num == last+1 ) {
            counter++;
        } else {
            if ( counter > maxcount ) maxcount = counter;
            counter = 1;
        }
        last = num;
    }
    if ( counter > maxcount ) maxcount = counter;
    return maxcount;
}

template< typename T > 
int findLongestUnsorted( T& set ) {
    int maxcount = 1;
    for( int num : set ){
      int counter = 1;
      while ( set.find(num+counter) != set.end() ) counter++;
      if ( counter > maxcount ) maxcount = counter;
    }
    return maxcount;
}

Second, I do not trust LeetCode for benchmarking. I believe it counts compile time as well as process startup as well, which can be dominant and ruin your benchmarks.

I would use Google Benchmarks for this, it is true and tested. To define a benchmark, you need to create two test cases which I do templated such that I can be sure I am comparing apples to apples.

template< typename T >
int test( const std::vector<int>& );

template<>
int test<std::vector<int>>(const std::vector<int>& nums) {
    if ( nums.empty() ) return 0;
    std::vector<int> sorted(nums);
    std::sort(sorted.begin(),sorted.end());
    return findLongestSorted( sorted );
}


template<>
int test<std::set<int>>(const std::vector<int>& nums ) {
    if ( nums.empty() ) return 0;
    std::set<int> sorted( nums.begin(), nums.end() );
    return findLongestSorted( sorted );
}

template<>
int test<std::multiset<int>>(const std::vector<int>& nums ) {
    if ( nums.empty() ) return 0;
    std::multiset<int> sorted( nums.begin(), nums.end() );
    return findLongestSorted( sorted );
}

template<>
int test<std::unordered_set<int>>(const std::vector<int>& nums ) {
    if ( nums.empty() ) return 0;
    std::unordered_set<int> set( nums.begin(), nums.end() );
    return findLongestUnsorted( set );
}

template<>
int test<std::unordered_multiset<int>>(const std::vector<int>& nums ) {
    if ( nums.empty() ) return 0;
    std::unordered_multiset<int> set( nums.begin(), nums.end() );
    return findLongestUnsorted( set );
}

Then it is just a matter of creating a vector of random numbers and hooking up with the test harness


template< typename T, size_t numel >
static void Benchmark(benchmark::State& state) {
  std::random_device rd;
  std::mt19937 gen(rd());
  std::uniform_int_distribution<int> dist(0,4*numel);
  std::vector< int > golden( numel );
  for ( int& value : golden ) value = dist(gen);

  for (auto _ : state) {
    int result = test<T>( golden );
    benchmark::DoNotOptimize(result);
  }
}


constexpr size_t NLOOPS = 100000;

BENCHMARK(Benchmark<std::vector<int>,NLOOPS>);
BENCHMARK(Benchmark<std::set<int>,NLOOPS>);
BENCHMARK(Benchmark<std::unordered_set<int>,NLOOPS>);
BENCHMARK(Benchmark<std::unordered_set<int>,NLOOPS>);
BENCHMARK(Benchmark<std::unordered_multiset<int>,NLOOPS>);

Link to Quick Benchmark Cpp

Update: the algorithm above PASSES leetcode with a score of 80% above all other C++ submissions.

You are measuring only runtime here, not time complexity. To measure the latter, you need to run each test with equivalent data (or multiple sample runs) with sizes at different orders of magnitude. If the growth curve is linear, then it's O(N). Relative timings for a single sample size are meaningless. As for the comment about it not being O(N) just because there's a loop inside another loop, you are incorrect. I've pointed out why that is in a comment on the original question. — paddy, Aug 29 '22 at 10:50

Why unordered_set is faster than vector in this situation?

1 Answers1