Which container should I use for a set of elements that are continuously added and removed

Question

I'd like to know which STL container would be most appropriate for a 'continually' changing collection of elements in terms of performance. The elements represent objects in a given viewing space, and as the view changes and objects move in and out of view, the collection of objects needs to be updated. Every time the view is updated, I get a std::vector of objects that should be in the scene, sorted by UID (lets call this List A). I want to run a check that:

removes objects that are no longer visible from the current list of things actually in the scene (call this List B) based on List A
adds new objects that are now visible to List B, again based on List A

Two to four thousand objects is probably the upper limit of the number of objects ever seen. I can't use a bit-mapped vector though since object UIDs run into the billions. I was thinking I could use std::map for List B. My proposed strategy for updating List B is:

For each UID 'i' in List B, search for UID 'i' in List A (using std::lower_bound). If the UID does not exist in List A, remove it from List B.
For each UID 'i' in List A, search for UID 'i' in List B (using std::map::lower_bound). If the UID does not exist in List B, add it.

So I'd like some advice, specifically if I'm using the right container type. I didn't think std::vector would be a good choice for List B because insertions/removals would be expensive and I'd have to explicitly sort if I wanted to use a binary search over std::find. Other than that though, I don't know much about the other container types and whether or not they're more appropriate.

I'd also like to know if the general method I've chosen to update List B is the most efficient way to do it, and I'm not just missing something obvious.

Deos this help? http://stackoverflow.com/questions/471432/in-which-scenario-do-i-use-a-particular-stl-container — Mark B, Mar 14 '12 at 21:06
If you remove everything from B that's not in A, and add everything to B that's in A but not B, how does your result differ from just making a copy of A? — Jerry Coffin, Mar 14 '12 at 21:08
How it differs is that if you are doing this in place, you can avoid making a copy of A. Suppose that the new display list matches what is displayed now. You can code this so that the display list is not modified at all in this case. (For whatever that is worth in a given situation: maybe nothing! It does seem like a better design to just install ListA as the new display list: not even make a copy. Use functional concepts.) — Kaz, Mar 14 '12 at 21:18
I don't copy ListA to ListB directly because they don't contain identical data. ListA just has object IDs. ListB stores the object ID and the actual object itself (sort of) which is why I chose std::map for ListB over std::set. Ie, before I'd add something to ListB from ListA, I need to actually create the object. — Prismatic, Mar 14 '12 at 21:47
(and deleting objects that still exist in the scene, which replacing ListB with ListA would do, doesn't work out too well) — Prismatic, Mar 14 '12 at 21:56

score 1 · Answer 1 · answered Mar 14 '12 at 21:14

1

Sounds like your problem calls for set operations. Removing the objects from the scene which are not in ListA is a set difference operation. Adding the objects from ListA which are not in the scene is a set union operation. Set operations are usually a candidate for hashing, when the data is not suitably organized for a bitmap.

You can get away with one of the operands to a set operations not being a hash-like data structure. That's the one you iterate, and do lookups in the other, to keep the behavior (amortized) O(N).

# left operand is list, right is hash:

set_diff (list x, hash y):
   val ret = make_hash()
   for i in x:
      if not y[i]
          ret[i] = i
   return ret

# vice versa

set_diff (hash x, list y)
    val ret = copy_hash(x)
    for i in y:
        if ret[i]
            delete ret[i]
    return ret

By the way, why would you search using lower_bound; are the ID's not unique or what?

answered Mar 14 '12 at 21:14

Kaz

55,781
9
100
149

I'm using lower_bound because its a binary search and my data (at least ListA) is ordered... (which is better than std::find right?). I'm not really following how to use a hashed container here -- how would I define a hash function (the IDs are unique, but you can assume that they're random, and vary from 1 to several billion)? – Prismatic Mar 14 '12 at 21:54
Right; but you also said *search for UID 'i' in List B (using std::map::lower_bound)* – Kaz Mar 14 '12 at 22:18
What else should I be using? Are you talking about using the [] operator? Or should I use map::find? They're all logarithmic. http://www.cplusplus.com/reference/stl/map/ – Prismatic Mar 14 '12 at 22:43
@Priz: yes, find is like lower_bound, except in the case that the UID is not in the set. Use find. – Mooing Duck Mar 15 '12 at 15:00

score 0 · Answer 2 · answered Mar 14 '12 at 23:01

0

What you need to decide is STL Containers, and scroll down to complexity of operations. Your choice is between std::list, std::deque and std::vector with the latter being the slowest of them all and the former being the fastest. Now, which of them provides all your requirements? Ideally you should use std::list because insertions are O(n) where std::deque are O(n+x) , where x is the number of elements up to the point of insertion. However, I think you are going to need std::deque::at() to retrieve elements. I think that you need this on both lists, thus you should use an std::deque .

answered Mar 14 '12 at 23:01

no mention of set nor profiling. Its highly possible that vector would be fastest. – Mooing Duck Mar 15 '12 at 14:55
`std::vector` is the slowest of them all and `std::set` is an associative container, I don't see your point? – Mar 15 '12 at 15:04
@g241 so what if `std::set` is an associative container? It's still the container designed for _exactly this sort of task_. It even has special members for the operations the OP describes. `std::vector` has the the worst _asymptotic time_ for random inserts/removals, while `std::list` has the best _asymptotic time_, but `vector` usually outperforms `list` anyway, due to caching. See [Bjarne Stroustrup's slides from GoingNative2012](http://ecn.channel9.msdn.com/events/GoingNative12/GN12Cpp11Style.pdf) (slide 45) – Mooing Duck Mar 15 '12 at 16:06

score 0 · Answer 3 · answered Mar 14 '12 at 23:10

0

Tried a boost unordered set for both?

Looks like you need a set here, but it also looks like you don't need sorting. If that's the case, the best candidate is an unordered set. In addition to giving you what you want (just a collection of things), it takes away the (small) performance hit that comes with sorting.

answered Mar 14 '12 at 23:10

Carl

43,122
10
80
104

std::unordered_set is part of the standard now, no need for boost. – Mooing Duck Mar 15 '12 at 14:56
"unordered set" is a redundant phrase that means "set". C++ previously wastefully used order-maintained sequences to represent sets and now it has actual sets. – Kaz Mar 15 '12 at 19:27
Interesting! Is this new for C++11? According to http://www.cplusplus.com/reference/stl/set/, it follows a strict weak ordering. – Carl Mar 15 '12 at 20:08

Which container should I use for a set of elements that are continuously added and removed

3 Answers3