6

When storing a bunch of items and I don't need random access to the container, I am using an std::list which is mostly fine. However, sometimes (esp. when I just push back entries to the back and never delete somewhere in the middle), I wish I had some structure with better performance for adding entries.

std::vector is bad because:

  • It must reallocate if it doesn't fit anymore.
  • It doesn't really work for huge amounts of data (because you just cannot always get very big chunks of continuous free memory).

std::list is bad because:

  • It makes an allocation on every single push_back. That is slow and leads to a lot of memory fragmentation.

So, something in between is what I want.

Basically, I want something like std::list< boost::array<T, 100> > or so. Or maybe instead of 100, let it be 4096/sizeof(T). Maybe also std::list< std::vector<T> > and the first vectors can be small and then further ones can grow. Actually I want to have that hidden from the usage, so I can just do a mycontainer.push_back(x).

std::rope is a bit similar to that but it is not available in the standard.

Is there something like this in Boost or so?

Matt Ball
  • 354,903
  • 100
  • 647
  • 710
Albert
  • 65,406
  • 61
  • 242
  • 386
  • Why not just use the SGI distribution's rope class? There's an advertising clause in the license, but otherwise, why not? – Matt K Sep 22 '10 at 16:57
  • @Albert, you said that you just store a bunch of items with push_back. So what's wrong with std::vector? Why do you prefer std::list? – Yakov Galka Sep 22 '10 at 17:14
  • @ybungalobill: `std::vector` is not good for very large chunks of data because the memory is allocated only continuously. – Albert Sep 22 '10 at 17:33
  • @Matt: I want it to work everywhere (i.e. on all compilers with all possible STL implementations). – Albert Sep 22 '10 at 17:40
  • @AshleysBrain: What advantages does that give me (except of constant insertion at the front which I don't care about here)? – Albert Sep 22 '10 at 17:42
  • @Albert, have you looked at a typical implementation of `std::deque`? I think it's exactly what you're looking for. See http://stackoverflow.com/questions/913980/confusion-on-iterators-invalidation-in-deque – Mark Ransom Sep 22 '10 at 17:48
  • @Mark, AshleysBrain: Ah, thanks for the info! I have looked up the implementation in GCCs STL and it indeed works exactly like what I described. – Albert Sep 22 '10 at 17:53
  • The standard containers use allocators so your concerns over vector and list are mostly unfounded. Both have been highly optimized to cope with exactly these situations and this looks like a case of pre-mature optimization. – Martin York Sep 22 '10 at 18:03
  • @Albert It's doubtful. First it may reduce fragmentation, second it saves memory (storing int32 in list on 64bit system will take 5 times the memory than in vector). Moreover list is cache unfriendly, so for such a frequent task as iteration it will be even *faster*. Taking into account that you said that you don't delete elements, you probably will initialize it once and iterate many times, so it may worth it. – Yakov Galka Sep 22 '10 at 18:56

2 Answers2

10

Have you considered using std::deque? Its elements are not stored contiguously but it does allow random access to elements; if you are only inserting elements at the beginning or end of the sequence, it may give better performance than a std::vector.

James McNellis
  • 348,265
  • 75
  • 913
  • 977
  • 1
    Surprisingly few people know about std::deque but it should really be the default for larger datasets – Martin Beckett Sep 22 '10 at 17:42
  • I really should have known the STL better! I always wondered what the `deque` is for and how it really differs (internally) from a `vector` (I always thought it is just one single chunk and the start index can change when you do an insertion at the front). – Albert Sep 22 '10 at 17:55
  • IIRC, MSVC uses a ridiculously lousy chunk size, making it act like a linked-list. Look into [Boost.Containers](http://www.boost.org/doc/libs/release/doc/html/container.html) for a consistent implementation across compilers. – GManNickG Dec 12 '11 at 21:45
1

Yes, it's called std::vector. It's O(1) time push_back which is almost always faster than std::list. (Yeah, and it's also memory efficient)

The most important feature of std::list is constant time deletion/insertion from the middle. If you don't need it choose std::vector.

Yakov Galka
  • 70,775
  • 16
  • 139
  • 220
  • 1
    If you have a large `vector` of objects that are expensive to copy, the reallocation that occasionally occurs during a `push_back` may make `vector` an unacceptable choice. – James McNellis Sep 22 '10 at 17:31
  • No, it's not O(1) for push_back on a vector. It is O(n). And on average it should be something like O(log n). And it doesn't work at all for big amount of data because it only can allocate one single big chunk of continuous memory. And std::list is slow because it has to make an allocation for every single push_back. – Albert Sep 22 '10 at 17:36
  • 6
    @Albert: No, insertion at the end (i.e., what `push_back` does) has amortized constant time complexity (O(1)). This means that on average it takes constant time to insert an element at the end. It is only when a reallocation is required that insertion at the end takes linear time. The implementation of vector is such that reallocations happen relatively infrequently, usually by having the underlying storage grow exponentially. – James McNellis Sep 22 '10 at 17:40
  • @James: Yes I know. But isn't that O(log n) then on average? I.e. if you insert n items, you basically need log(n) reallocations. In fact probably even more than just O(log n) because the reallocations don't just take constant time (they will take much more time for huge n). – Albert Sep 22 '10 at 17:57
  • 3
    @Albert: No, it's still O(1). Consider a vector with a size of 1,000 and a capacity of 1,000. You insert an element and a reallocation takes place, increasing the capacity to 2,000. This reallocation takes 1,000 time units. You can then insert 999 more elements in constant time. The time the reallocation took (1,000 time units) is amortized over the 1 + 999 = 1,000 insertions you can perform before the next reallocation; 1,000 / 1,000 = 1. [Yes, this is simple, and not all implementations use a 2x scale factor; I think VC++ uses 1.5x, for example, but this is about how it works.] – James McNellis Sep 22 '10 at 18:10
  • 1
    @Albert It's O(1), read about amortized analysis. If you insert N objects then you do at most 2N copy operations *in total*. – Yakov Galka Sep 22 '10 at 18:40
  • @ybungalobill: Ah, thanks, I see now. Well, I get about 3N copy operations while trying to calculate that but now I see the point. I.e. it means 3 copy operations on average. I.e. O(1). Thanks! :) – Albert Sep 22 '10 at 19:16