7

Which STL container should i use if:

  1. Data is inserted and removed regularly.
  2. Data is accessed regularly at random.

E.g : dataset(4,10,15) if i want to find the closest number to 9, then it should return me 10.

  1. I am only storing an integer.
  2. It needs to be sorted
  3. Can go to 100k datasets

I thought of using vector, but vector insertion and removing is expensive.

   vector<int>

If i were to use list, i would have to access O(n) elements before reaching the data.

   list<int>

I was thinking of using set as it will be good if it is sorted, but im not very sure about the efficiencies for using SET

So i hope someone can give a good solution!

ST3
  • 8,826
  • 3
  • 68
  • 92
mister
  • 3,303
  • 10
  • 31
  • 48
  • 1
    It totally depends on how you are inserting and accessing the data, and how the data is ordered. Do you need random access? Do you need to keep the exact order of the data? – Ruud May 12 '12 at 19:45
  • How do you want to acces your data ? Becausse for a vector accesing the data is also o(n) except if you know the index of the item you want to access already ? – Nactive May 12 '12 at 19:45
  • 2
    if the Vector is sorted, lookup is only log(n) since you can do Binary Search – Greg Flynn May 12 '12 at 19:49
  • 2
    If Boost is an option, compare your results with Boost's "flat set" from the Containers library. – Kerrek SB May 12 '12 at 20:29

5 Answers5

16

I think you should check this SO post: In which scenario do I use a particular STL container? for small sizes vector will suit most scenarios irrespective of what you intend to do.

The chart is a guide though, the fact that the container is accessed regularly does not affect container choice, the fact that you are storing int is unimportant unless you care about the size of the container, in which case does the overhead of the pointers in a list container or map matter to you?

Sorting is done automatically by map but sorting a vector and list can be very fast if the container size is small enough to fit in memory.

Data insertion is optimised for lists and maps anywhere in the container, for maps you get the benefit that it will sort itself but again if the size is small enough then constructing a new vector with the new entry could be very fast still.

You may also want to consider hash maps, you would still be best to profile your code, trying to second guess what is optimal depends on your usage and you really need to measure and profile.

You could also just decide that an STL <map> is a fine enough balance or a <set> and use those containers as they automatically sort on insertion and deletion and look up is fast but there is the overhead of maintaining the pointers in each entry that increases the size of the memory used compared to vector, if you don't care about this then you could consider these containers.

Still if it matters then test and profile and compare the performance of each container, you will be surprised by how the code will perform against your assumptions.

Community
  • 1
  • 1
EdChum
  • 376,765
  • 198
  • 813
  • 562
8

If the requirement is just performance, the choice should basically always be a std::vector.

It avoids the many memory allocations of node-based data structures (trees and lists), and it exploits spatial locality for much more efficient traversal.

Of course, insertions/removals at the middle of the vector require elements to be moved, but even that is rarely enough to make the vector slower than other data structures.

The only real reasons I see for using other data structures are these:

  • std::map/std::set: those are great for convenience. Nice and easy to use, so if optimal perfomance isn't required, I use those when I need a sorted container, or a key/value map. (for best performance, a sorted vector may very well be preferable)
  • all other containers: may be useful for the correctness guarantees the offer in the face of modifications: the vector frequently reallocates and moves its contents, which invalidates both pointers and iterators into the vector. The other data structures offer stronger guarantees there (for a deque, pointers are guaranteed to stay valid after after insertion/removal at the ends, but iterators may still be invalidated. For list, set and map, both pointers and iterators are guaranteed to stay valid during insertion/removal)

Of course, these are just rules of thumb.

The only universally true rule when performance is involved is "benchmark it yourself". I can tell you how a vector typically performs in many common scenarios, but I can't tell you how it performs in your code, with your compiler and your standard library. So if you worry about performance, measure it. Try out the different alternatives, and see which is faster.

jalf
  • 243,077
  • 51
  • 345
  • 550
  • Hi there thanks for the reply, sorry just want to clarify, so based on my edit i provided the following example,E.g : dataset(4,10,15) if i want to find the closest number to 9, then it should return me 10. And my dataset can go to 100k dataset. So does it mean that it is still better to use vector and sort/binarysearch? – mister May 12 '12 at 19:56
  • Well, the last part is the important one: test it, if you want to be sure. But a binary search is going to thrash the cache no matter what, so it probably makes little difference if the data is stored contiguously or not. For linear traversal a vector would be a clear winner though. How static is the dataset? Is it continuously modified? – jalf May 12 '12 at 20:02
  • yes most probably constantly modified – mister May 12 '12 at 20:15
  • ok, that makes it harder to predict, IMO. try out different strategies, and see what works best – jalf May 12 '12 at 20:29
  • but my gut feeling is that the vector will win out. An option might even be to keep the vector unsorted, and just do a linear scan each , time. More comparisons, but it better exploits the locality, and it saves the cost of sorting -- and means elements don't need to be moved during insertion/removal – jalf May 12 '12 at 20:44
2

A set is efficient enough to insert/remove/access and it is always sorted. The only thing to consider is that entries in sets are const (so the ordering is not broken), so to change, you should remove, update and insert

stefaanv
  • 14,072
  • 2
  • 31
  • 53
1

The answer to your question is completely dependent on your data set size, as a list grows to to huge sizes , the time it takes to do the linear traversal to get to the element you need to remove / insert at far outweighs the time it takes for a vector to do a removal/ insertion. So if your data set is small, go with lists, if it's huge, go with vector.

johnathan
  • 2,315
  • 13
  • 20
  • Why would you prefer a list for small datasets? It's just as ridiculously slow in that case – jalf May 12 '12 at 19:46
  • @jalf list is ridiculously slow any way you look at it. – johnathan May 12 '12 at 19:49
  • @jalf answer had to do with what the OP was trying to choose from – johnathan May 12 '12 at 19:50
  • 1
    Yeah, but I don't understand the "if your data set is small, go with lists". If the dataset is small, a `vector` is just as much faster as it is with a large dataset – jalf May 12 '12 at 19:53
1

If it needs to be sorted, use a Binary Search Tree

Greg Flynn
  • 1,694
  • 2
  • 14
  • 15