2

Suppose I use a std::unordered_set<MyClass>, and suppose sizeof(MyClass) is large, i.e. much larger than sizeof(size_t) and sizeof(void*). I will add a large number numberOfElementsToBeAdded of elements to the unordered set.

From https://stackoverflow.com/a/25438497/4237 , I find: "each memory allocation may be rounded up to a size convenient for the memory allocation library's management - e.g. the next power of two, which approaches 100% worst case inefficiency and 50% average, so let's add 50%, just for the list nodes as the buckets are likely powers of two given size_t and a pointer: 50% * size() * (sizeof(void*) + sizeof((M::value_type))"

This drives me to conclude that actual memory consumption will be between 1*numberOfElements*sizeof(MyClass) and (1+1)*numberOfElements*sizeof(MyClass), modulo some additional memory, which is of little interest here because it is of the order sizeof(size_t).

However, I know the number of elements to be added in advance, so I call:

std::unordered_set<MyClass> set;
set.reserve(numberOfElementsToBeAdded);
//Insert elements

Thinking about the parallel of calling std::vector::reserve, I would guess that this avoids the potential overhead, and would thus guarantee memory consumption to be approximately 1*numberOfElements*sizeof(MyClass) (modulo some additional memory, again of order sizeof(size_t)).

Could I rely on implementations of the standard library to guarantee that calling reserve like this will avoid the 0-100% (50% average) overhead mentioned in the above answer?

willem
  • 2,617
  • 5
  • 26
  • 38

2 Answers2

3

From this std::unordered_set::reserve reference

Sets the number of buckets to the number needed to accomodate at least count elements ...

There's some more, but the gist is that std::unordered_set::reserve doesn't actually work like std::vector::reserve. For an unordered set it allocates buckets for the hash table, not memory for the actual elements themselves.

The set could put one element per bucket, but it's not guaranteed.

Some programmer dude
  • 400,186
  • 35
  • 402
  • 621
  • 1
    Right. The purpose of calling `std::unordered_set::reserve` is to avoid having to re-size the hashing structures used internally for indexing the set. It (typically) has no effect on how the actual members of the set are allocated or stored. – David Schwartz Mar 27 '19 at 08:35
  • Do buckets only require sizeof(size_t)/sizeof(void*) memory or do they (also) require sizeof(M:valuetype) = sizeof(MyClass) memory? – willem Mar 27 '19 at 08:44
  • @willem That's an implementation-specific detail that is not specified. You have to check your implementation. – Some programmer dude Mar 27 '19 at 08:55
  • @Someprogrammerdude Thanks. You say "the set could put one element per bucket, but it's not guaranteed." I have trouble understanding what that implies. Am I correct in saying that it could also be less than 1 element/MyClass per bucket? – willem Mar 27 '19 at 12:50
  • @willem A very simple example: You create a set with `X` buckets, then add `Y` elements. If `X > Y` then you will have less elements than buckets. A bucket can contain zero or more elements, but it will always be an integral amount (a bucket can't have half an element, for example). – Some programmer dude Mar 27 '19 at 13:00
  • @Someprogrammerdude Sorry, I still don't fully understand this. Suppose I call reserve(3000), and add 3000 elements. Suppose also that in reaction to the call reserve(3000), 4000 buckets are created by the set (load factor 0.75). Then after adding the elements, there would be 4000 buckets and 3000 elements. Now my question is the memory consumption for this. Suppose the elements are 100*sizeof(size_t) each. Then the memory consumption would be at least 3000*100*sizeof(size_t). But the question is what would be the memory consumption of the empty buckets? – willem Mar 28 '19 at 08:06
  • It may also be that I don't understand the answer to the other question that I linked to, or that that answer is not fully correct. When I think of it more, if you would use list to contain the elements in the buckets, then empty buckets would consume no memory of order sizeof(myclass), only some overhead? – willem Mar 28 '19 at 08:09
  • 1
    @willem I don't know how much memory an empty bucket would use, since that depends on the implementation. It also feels like you're not telling us the whole problem... ***Why*** do you need to know? What is the *actual* problem you need to solve? Why are you so worried about memory consumption (especially since we're still in the realm of low single-digit MiB range here)? – Some programmer dude Mar 28 '19 at 08:43
  • @ProgrammerDude: Based on your previous comment, I wasn't sure whether I understoon. I now understand that there is really no way to know whether an empty bucket already holds reserved memory for a MyClass instance that may be added later. Somehow I didn't get that fact from the answers and comments. But this answers my questions. – willem Mar 28 '19 at 09:00
  • Reason I needed to know is that I might add millions of elements to dictionary, and that MyClass may be as large as 1000*sizeof(size_t). So I don't want to pay up to 100% overhead. – willem Mar 28 '19 at 09:02
0

Quoting the standard:

Sets the number of buckets to the number needed to accomodate at least count elements without exceeding maximum load factor and rehashes the container, i.e. puts the elements into appropriate buckets considering that total number of buckets has changed

The important part is **at least*. I do not think you can rely on the implementation and be sure that it only uses less amount of memory. My guess is that regardless if you call reserve the memory usage for numElements will be similar.

Davide Spataro
  • 7,319
  • 1
  • 24
  • 36