1

I am looking a container to support frequent adds/removals. I have not idea how large the container may grow but I don't want to get stalls due to huge reallocations. I need a good balance between performance and consistent behavior.

Initially, I considered std::tr1::unordered_map but since I don't know the upper bound of the data set, collisions could really slow down the unordered_map's performance. It's not a matter of a good hashing function because no matter how good it is, if the occupancy of the map is more than half the bucket count, collisions will likely be a problem.

Now I'm considering std::map because it doesn't suffer from the issue of collisions but it only has log(n) performance.

Is there a way to intelligently handle collisions when you don't know the target size of an unordered_map? Any other ideas for handling this situation, which I imagine is not uncommon?

Thanks

Nemo
  • 70,042
  • 10
  • 116
  • 153
impulsionaudio
  • 219
  • 2
  • 7
  • 5
    The right answer, of course, is to abstract the choice of container by hiding it inside a class. Then you can profile your actual application with different containers trivially. That said... I bet `std::unordered_map` will work better than you think. – Nemo Dec 16 '11 at 19:26
  • I'm assuming you want an associative container? – GWW Dec 16 '11 at 19:26
  • 3
    You will probably be surprised at the performance of a std::vector. It is a *very* good choice until you have some very specific requirements. – Bo Persson Dec 16 '11 at 19:30
  • How large is "large"? log n is really not that bad; what's more important is how quick the comparison is. As Bo says, a vector may be an excellent choice until you have reason to believe otherwise; if the elements are small, start with that. In any event, you should always keep your code modular so that such a change would be trivial to make later. – Kerrek SB Dec 16 '11 at 19:35
  • 2
    @BoPersson: Don't forget to recommend `std::deque`, it is better than vector for removal/insertion. (Hmm, now I'm tempted to make a map-like wrapper around a deque of pairs to play with) – Mooing Duck Dec 16 '11 at 20:06
  • Take a look at this answer to help you figure out the best choice for YOUR requirements: http://stackoverflow.com/a/4010126/129655 – Platinum Azure Dec 16 '11 at 20:31
  • 1
    This question is missing so much information that it's impossible to give any useful answer. – zvrba Dec 16 '11 at 20:32
  • If you want some control over collisions, `unordered_map` has member functions to query the `bucket_count()` and `load_factor()` and a `rehash()` function to change the bucket count. – Blastfurnace Dec 16 '11 at 20:41
  • -1: for failure to state what the container will be used for or what the access pattern will be. – Nicol Bolas Dec 16 '11 at 20:47

2 Answers2

1

This is a run-time container, right?

Are you adding at the end (as in push_back) or in the front or the middle? Are you removing at random locations, or what?

How are you referencing information in it? Randomly, or from the front or back, or what?

If you need random access, something based on array or hash is preferred.

If reallocation is a big problem, you want something more like a tree or list.

Even so, if you are constantly new-ing (and delete-ing) the objects that you're putting in the container, that alone is likely to consume a large fraction of time, in which case you might find it makes sense to save used objects in junk lists, so you can recycle them.

My suggestion is, rather than agonize over the choice of container, just pick one, write the program, and then tune it, as in this example. No matter what you choose, you will probably want to change it, maybe more than once. What I found in that example was that any pre-existing container class is justifying its existence by ease of programming, not by fastest-possible speed.

I know it's counter-intuitive, but unless some other activity in your program ends up being the dominant cost, and you can't shrink it, your final burst of speed will require hand-coding the data structure.

Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
  • Yes, this is a run-time container. I need random access because the removals need to be fast. The adds need to be fast as well but that's easier. I figured I would not want to new-ing/delete-ing every time so a pool of some kind seems appropriate. Maybe a deque-like structure (as someone mentioned above) is a place to start: random access, but not contiguous so add/removal won't result in an expensive rehash/realloc. – impulsionaudio Dec 16 '11 at 21:35
0

What kind of access do you need? Sequential, random access, lookup by key? Plus, you can rehash unordered map either manually (rehash method), and set its load factor. In any case the hash will rebuild itself when chains get too long (i.e., when the load factor is exceeded). Additionally, the slow-down point of a hash table is when it is full ~80%, not 50%.

You should really have read the documentation, for example here.

zvrba
  • 24,186
  • 3
  • 55
  • 65