For the operation count(). Which is faster std::set or std::unordered_set?

Question

Suppose that sizeof(void*) == sizeof(size_t) and that hashing the pointer is just casting it to size_t. So I want to know if my set contains an element, which will be faster std::set<void*> or std::unordered_set<void*>?

I know how std::set works, but I am not familiar with std::unordered_set. Well, I know that unordered set uses hashing and buckets and that if no intersection occurs (which is my case) the complexity is O(1). But I dont know how much is this constant complexity.

If how much date is in the container is relevant, my actual scenario uses less than a hundred¹. But my curiosity concerns both cases with few elements and a lot of elements.

¹ The amount of elements is so few that even a std::vector would perform fine.

Hash tables tend to use more memory to perform access in less time. — le3th4x0rbot, May 23 '13 at 23:09
http://stackoverflow.com/questions/1349734/why-on-earth-would-anyone-use-set-instead-of-unordered-set — Mooing Duck, May 23 '13 at 23:10
Why don't you just compare the complexity of the operations you are interested in and decide according to that? See [`unordered_set`](http://en.cppreference.com/w/cpp/container/unordered_set), [`set`](http://en.cppreference.com/w/cpp/container/set), and perhaps also [`vector`](http://en.cppreference.com/w/cpp/container/vector). — syam, May 23 '13 at 23:14
Damm, I forgot to put on my question. The operation I am doing is check if a element is contained in the set. I just edited it. — André Puel, May 23 '13 at 23:22
@syam The only operation that matters is count(). It has complexity log(n) on set and 1 on unordered_set. I was just wondering about how big is this constant complexity. And let me emphasize that it is not a micro optimization that I am pretending to do, it just something I was wondering. — André Puel, May 23 '13 at 23:27
@AndréPuel: well if N is small enough that you can't easily guess whether O(log(n)) or O(1) will be more efficient then you have only one solution left. As AndyProwl says: benchmark! — syam, May 23 '13 at 23:31

Andy Prowl · Accepted Answer · 2013-05-23T23:18:30.820

I think the important bit is in the small footnote:

The amount of elements is so few that even a std::vector would perform fine.

std::vector is more cache friendly than std::set and std::unordered_set, because it allocates its elements in a contiguous region of memory (this is mandated by the Standard).

And although lookup and insertion complexity is worse for std::vector than for std::set or std::unordered_set (linear versus O(log N) and amortized O(1), respectively), data locality and a higher cache hit rate is likely to be dominating the computational complexity and yield better performance.

In general, anyway, the impact on performance of the data structures you choose also depends on the kind of operations you want to perform on them and on their frequency - this is not mentioned in your post.

As always, however, measure the different alternatives before you commit to one, and always favor a simpler design when you do not have evidence that it represents a bottleneck preventing your application from meeting its performance requirements.

+1 for 'measure' - that's what people need to understand. Just one more point which you may want to add is that if insertions are rare and look-ups frequent a sorted std::vector may be the way to go, using std::binary_search, to look up and insert things; this way you get O(log n) & cache locality for lookups and O(n) for insertions. — cristicbz, May 24 '13 at 11:47
@cristicbz: Good point. I think your comment, being the first after the answer and having my +1, will be noticed and taken into consideration even without editing the answer ;) — Andy Prowl, May 24 '13 at 12:02

score 2 · Answer 2 · answered May 23 '13 at 23:12

unordered_set has better cache locality than set in many implementations. But since the number of elements is so small in your case a vector is likely to fit completely into cache, making it a better choice despite the O(n) complexity of looking up elements.

For the operation count(). Which is faster std::set or std::unordered_set?

2 Answers2