Complexity of std::unordered_set iterator traversal

Question

I recently played around with a std::unordered_set. I'm suspecting my version of the STL keeps track of non-empty buckets in some FILO data-structure (looks like a list). I suppose this is done in order to provide O(n) time traversal of the complete std::unordered_set (where n denotes the number of elements in a unordered_set with m buckets and m much larger than n). This improves a naive traversal of all buckets in O(m) time.

I've tested that indeed traversal of large and very sparse unordered_sets (with begin - end) is much faster than a naive traversal of all buckets.

Question: Is this traversal runtime guaranteed by the standard? Or is this just a feature of my particular standard library?

Here is my test code to play around with:

#include <iostream>
#include <vector>
#include <numeric>
#include <unordered_set>
using namespace std;

void test(vector<int> data, int alloc_size) {
   unordered_set<int> set(alloc_size);
   for (auto i: data) {
      set.insert(i);
   }

   for (size_t bidx = 0; bidx < set.bucket_count(); ++bidx) {
      cout << "[B" << bidx << ":";
      for (auto bit = set.begin(bidx); bit != set.end(bidx); ++bit) {
         cout << " " << *bit;
      }
      cout << "] ";
   }

   cout << "  {";
   for (auto const & d: set) {
      cout << d << " ";
   }
   cout << "}" << endl;
}

int main() {
   test({1, 2, 0}, 3);
   test({1, 2, 0, 7}, 3);
   test({18, 6, 11, 3, 13, 4}, 20);
   test({18, 6, 11, 3, 13, 4, 34}, 20);
}

Which prints:

[B0: 0] [B1: 1] [B2: 2] [B3:] [B4:]   {0 2 1 }
[B0: 0] [B1: 1] [B2: 7 2] [B3:] [B4:]   {0 7 2 1 }
[B0:] [B1:] [B2:] [B3: 3] [B4: 4] [B5:] [B6: 6] [B7:] [B8:] [B9:] [B10:] [B11: 11] [B12:] [B13: 13] [B14:] [B15:] [B16:] [B17:] [B18: 18] [B19:] [B20:] [B21:] [B22:]   {4 13 3 11 6 18 }
[B0:] [B1:] [B2:] [B3: 3] [B4: 4] [B5:] [B6: 6] [B7:] [B8:] [B9:] [B10:] [B11: 34 11] [B12:] [B13: 13] [B14:] [B15:] [B16:] [B17:] [B18: 18] [B19:] [B20:] [B21:] [B22:]   {4 13 3 34 11 6 18 }

It appears the begin - end traversal reports buckets in the reverse order in which they became non-empty (cf. first and third line). Inserting into an already non-empty bucket does not change this ordering (cf. second and fourth line).

Sander De Dycker · Accepted Answer · 2017-04-13T12:46:27.243

8

In short : yes, this is guaranteed by the standard.

Explanation

All iterators are required to have an O(n) traversal time complexity (where n is the amount of items traversed). This is because every single operation on an iterator has a constant time complexity (O(1)), including advancing the iterator one position.

From the standard (section 24.2.1 §8) :

All the categories of iterators require only those functions that are realizable for a given category in constant time (amortized). Therefore, requirement tables for the iterators do not have a complexity column.

So, when iterating over the items of a std::unordered_set, the time complexity is O(n) (with n the amount of items in the set).

Not convinced ?

A literal reading of the above quote only guarantees that constant time operations are realizable. This doesn't prevent a specific implementation from having worse time complexity than what's realizable. This is probably down to a bad choice of words, and hopefully no serious implementations actually do this.

The only other place in the standard that can help resolve this ambiguity, is in section 24.4.4 §1, where the standard has this to say about std::advance and std::distance :

These function templates use + and - for random access iterators (and are, therefore, constant time for them); for input, forward and bidirectional iterators they use ++ to provide linear time implementations.

So, the ++ operation on a forward iterator (as used for std::unordered_set) is implied to be a constant time operation.

In summary, while the wording of the first quote is ambiguous, the second quote confirms the intent.

edited Apr 13 '17 at 12:46

answered Apr 13 '17 at 08:56

Sander De Dycker

16,053
1
35
40

1

Thanks. But, doesn't this only restrict the required interface of iterator categories (ie forward iterators) to those functions that can be implemented in `O(1)`? In particular, this doesn't **require** that the implementation is indeed `O(1)`? – m8mble Apr 13 '17 at 10:54
1

@m8mble : that is a fair observation to make, and you can indeed read it that way. I'm trying to find a better quote in the standard. In the meantime, the standard says this about the implementation of [`std::advance`](http://en.cppreference.com/w/cpp/iterator/advance) : "for input, forward and bidirectional iterators they use `++` to provide linear time implementations". `std::advance` can't be implemented in linear time if the iterator doesn't iterate in linear time. – Sander De Dycker Apr 13 '17 at 11:56
Thanks again. At least the intention seems clean with the addendum. Is there a place to report this seeming deficiency? – m8mble Apr 13 '17 at 12:55
1

@m8mble : useful links : the [list of C++ standard library issues](http://www.open-std.org/jtc1/sc22/wg21/docs/lwg-index.html), and the [procedure for reporting an issue](https://isocpp.org/std/submit-issue). – Sander De Dycker Apr 13 '17 at 14:08
Does all the stdlib iterator operation like `++it` takes constant `O1` (not amortised constant),not matter what kind of iterator it is? – choxsword Mar 14 '18 at 15:13

Complexity of std::unordered_set iterator traversal

1 Answers1

Explanation

Not convinced ?

Linked