9

I'm storing pointers in std::unordered_set. I do this because I don't want any duplicates (I delete the pointers in the collection, so if there is a duplicate, I will attempt to delete an already deleted pointer). I loop heavily through these sets, and since I know std::vector is the fastest container for looping (contiguous memory), I wondered if std::unordered_set does the same.

If it doesn't, would using a std::vector and checking if the pointer has already been deleted be faster?

pro-gramer
  • 166
  • 1
  • 3
  • 14
Vittorio Romeo
  • 90,666
  • 33
  • 258
  • 416
  • 2
    If you want to know, you should benchmark it. – Waleed Khan Jan 17 '13 at 17:26
  • 6
    Is it your bottleneck, really? Did you profile? – Lightness Races in Orbit Jan 17 '13 at 17:31
  • 1
    You *manually* delete the pointers? i.e. you store raw pointers inside that container? If the container is your bottleneck, you are accessing it often, and if you juggle with those raw pointers that much I you will run into memory management issues which are much more than the performance penalties of `unordered_set` vs. `vector` in your scenario. Try to use smart pointers. Maybe `unique_ptr`, you will have to write pure evil code to get duplicates of those ;) – Arne Mertz Jan 17 '13 at 17:44
  • @LightnessRacesinOrbit I tried profiling, but it wasn't clear what the bottleneck was. The profiler gave me hard-to-read symbols and assembly code. Do you have a profiler to suggest? – Vittorio Romeo Jan 17 '13 at 17:45
  • 2
    You're literally the template-source for `std::unordered_set<>` away from answering this yourself, and once cracked open you'll find no chance of a contiguous implementation (and what would it matter if it was? the most expensive part of a decent hash-table implementation is the hash function itself (degenerate linked-list via unbounded collisions not withstanding)). – WhozCraig Jan 17 '13 at 17:45
  • Reasonable `unordered_set` implementation will use 'empty' hash function for the pointer type. Also this is very easy to implement. – 9dan Jan 17 '13 at 17:54
  • @9dan, a `unordered_set` implementation will use whatever hash function the user tells it to use, empty or not! – Jonathan Wakely Jan 17 '13 at 22:40
  • @WhozCraig: Why would the hashfunction be a bottleneck when looping through the collection (which shouldn't even need hashing)?. @Vee: Are you 100% sure you need to use raw pointers instead of `unique_ptr` or `shared_ptr`? If you aren't I would really suggest using smartpointers. – Grizzly Jan 19 '13 at 15:12

4 Answers4

19

Is std::unordered_set contiguous ?

The exact implementation of containers is not detailed by the standard... however the standard does prescribes a number of behaviors which constrains the actual representation.

For example, std::unordered_set is required to be memory stable: a reference to/address of an element is valid even when adding/removing other elements.

The only way to achieve this is by allocating elements more or less independently. It cannot be achieved with a contiguous memory allocation as such an allocation would necessarily be bounded, and thus could be overgrown with no possibility of re-allocating the elements in a bigger chunk.

Matthieu M.
  • 287,565
  • 48
  • 449
  • 722
  • Existing iterators are invalidated when inserting or removing other elements in an `unordered_set` if it causes a rehash. ? Are you sure your statement about "memory stable" is correct? – Andrew Tomazos Jan 17 '13 at 21:16
  • 3
    @AndrewTomazosFathomlingCorps, but rehashing specifically doesn't invalidate pointers or references. Iterators don't necessarily refer directly to memory, pointers do. Elements do not move in memory on a rehash, the connections between them (which iterators traverse) might get rejiggled, but the elements do not move. – Jonathan Wakely Jan 17 '13 at 22:41
  • @JonathanWakely: In that case the buckets must just be the head pointers of linked lists (a linked hash). I think this means that every insert causes a dynamic memory allocation. I think it would be more efficient to have the buckets first element placed in aligned_storage. Of course this would mean you have to move construct on rehash rather than just copy a pointer, but this is the same "problem" as std::vector. – Andrew Tomazos Jan 18 '13 at 02:12
  • @AndrewTomazosFathomlingCorps: I believe this requirement of stability was to mimick more closely what happens in a `std::set`; I fully agree however that it's annoying and severely constrain the possible designs. That being said, thanks to allocators you still have the possibility of lessening the impact of the allocation: you could provide a bump allocator for example. – Matthieu M. Jan 18 '13 at 07:38
  • @MatthieuM.: Whats a "bump allocator" ? The elements have to be allocated and deallocated in relative random order, so the problem of doing so is the same as any dynamic allocation scheme. – Andrew Tomazos Jan 18 '13 at 11:41
  • Also if we are doing the full single linked list anyway from the bucket head than why not make it a doubly linked (tail points back to bucket head), that way iterators would not have to be invalidated on a rehash. – Andrew Tomazos Jan 18 '13 at 11:43
  • @AndrewTomazosFathomlingCorps: 1) A "bump" allocator does not concern itself (usually) with freeing. It just appends the new objects as it go. It's extremely efficient for short-lived collections or collections with little to no removal. 2) Indeed, this could be possible; I suspect the list is already doubly-linked (since Bidirectional Iteration is supported). Functionally though it's a bit dubious, because by incrementing the iterator you could pass over elements you already visited which is in violation to the `FowardIterator` contract: on iteration each element should be visited once. – Matthieu M. Jan 18 '13 at 12:28
  • @MatthieuM.: (re 2) True, however random reordering of iterators would still be better than UB. There are a number of algorithms that keep looping over a collection and modifying it until some equilibrium is reached (eg a closure operation), these would still be correct even with random reordering on modification. (Aside: rehashing can be blocked to keep iterators valid http://stackoverflow.com/q/13981886/1131467) – Andrew Tomazos Jan 18 '13 at 12:51
4

No it is not contiguous memory, but it's still really fast, thanks to a hash map.

Edit: fast for random access, if you mainly do loops, you should consider another container, I think.

Edit2: And you should profile so as to know if it's worth thinking about another container. ( Maybe you should optimize somewhere else... maybe).

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
Stephane Rolland
  • 38,876
  • 35
  • 121
  • 169
4

The fact that the following member functions are offered by std::unordered_map suggests that it is based on a hashed-table, perhaps separate chaining with linked lists.

bucket_count, hash_function, load_factor, max_load_count, rehash

Whether the elements are contiguous or not depends on the allocator. The default allocator for the unordered_map and list does not allocate the elements in contiguous memory. The memory for each element is allocated at the time of its insertion.

However, you can provide a custom allocator (such as a pool allocator) which may allocate the elements from a pre-allocated memory pool. Still, the logically adjacent elements in the data structure may not be physically adjacent in the memory.

So, if looping through all the elements is the most frequent operation, then the unordered_map may not be best solution. Running the dominant use cases through a profiler for all competing solutions would reveal the best solution.

In addition to that, unordered_map is not the best choice to loop for another reason. Note the word "unordered" in the name, it conveys that -- unlike list, vector, or map -- there is no order of the elements. For example, the member function rehash may change the relative order of the elements. In fact, rehashes are automatically performed by the container whenever its load factor is going to exceed the max_load_factor during any operation.

Arun
  • 19,750
  • 10
  • 51
  • 60
1

std::unordered_set is supposed to be a hash map container, so we could assume it has a little performance penalty when comparing with std::vector.

But I think you must check out the actual profiling result if the unordered_set access is real hotspot.

If the STL implementation you are using is reasonable one, it should provide vector like specialization for the pointer or int type key. If it's true, the unordered_set specialized for the pointer type will behave much like the automatically growing/shrinking vector and performance difference will be unnoticeable.

9dan
  • 4,222
  • 2
  • 29
  • 44