I recently played around with a std::unordered_set
. I'm suspecting my version of the STL keeps track of non-empty buckets in some FILO data-structure (looks like a list). I suppose this is done in order to provide O(n)
time traversal of the complete std::unordered_set
(where n
denotes the number of elements in a unordered_set
with m
buckets and m
much larger than n
). This improves a naive traversal of all buckets in O(m)
time.
I've tested that indeed traversal of large and very sparse unordered_set
s (with begin
- end
) is much faster than a naive traversal of all buckets.
Question: Is this traversal runtime guaranteed by the standard? Or is this just a feature of my particular standard library?
Here is my test code to play around with:
#include <iostream>
#include <vector>
#include <numeric>
#include <unordered_set>
using namespace std;
void test(vector<int> data, int alloc_size) {
unordered_set<int> set(alloc_size);
for (auto i: data) {
set.insert(i);
}
for (size_t bidx = 0; bidx < set.bucket_count(); ++bidx) {
cout << "[B" << bidx << ":";
for (auto bit = set.begin(bidx); bit != set.end(bidx); ++bit) {
cout << " " << *bit;
}
cout << "] ";
}
cout << " {";
for (auto const & d: set) {
cout << d << " ";
}
cout << "}" << endl;
}
int main() {
test({1, 2, 0}, 3);
test({1, 2, 0, 7}, 3);
test({18, 6, 11, 3, 13, 4}, 20);
test({18, 6, 11, 3, 13, 4, 34}, 20);
}
Which prints:
[B0: 0] [B1: 1] [B2: 2] [B3:] [B4:] {0 2 1 }
[B0: 0] [B1: 1] [B2: 7 2] [B3:] [B4:] {0 7 2 1 }
[B0:] [B1:] [B2:] [B3: 3] [B4: 4] [B5:] [B6: 6] [B7:] [B8:] [B9:] [B10:] [B11: 11] [B12:] [B13: 13] [B14:] [B15:] [B16:] [B17:] [B18: 18] [B19:] [B20:] [B21:] [B22:] {4 13 3 11 6 18 }
[B0:] [B1:] [B2:] [B3: 3] [B4: 4] [B5:] [B6: 6] [B7:] [B8:] [B9:] [B10:] [B11: 34 11] [B12:] [B13: 13] [B14:] [B15:] [B16:] [B17:] [B18: 18] [B19:] [B20:] [B21:] [B22:] {4 13 3 34 11 6 18 }
It appears the begin
- end
traversal reports buckets in the reverse order in which they became non-empty (cf. first and third line). Inserting into an already non-empty bucket does not change this ordering (cf. second and fourth line).