2

I need to append a large number of elements to a stxxl vector. What is the most efficient way of adding elements to a stxxl vector? Right now, I'm using push_back of the stxxl vector, but it doesn't seem very efficient. It's far from saturating the disk bandwidth. Is there a better way?

Thanks, Da

Da Zheng
  • 111
  • 2
  • 8
  • Never done anything like this before, but you could maybe try this: Store a relatively-large amount of values in a regular STL container. When a certain limit is reached then resize the STXXL container to adjust for the amount in the STL container. Use direct iterator access to fill the new STXXL positions with the STL values. – flakes Nov 24 '14 at 20:55
  • does stxxl's vector have `reserve`? – Tim Seguine Nov 24 '14 at 20:56
  • @TimSeguine: Yes: http://stxxl.sourceforge.net/tags/1.4.1/classstxxl_1_1vector.html#a07c6c6ec13a7a0324c34aad594dac9b7 – Nemo Nov 24 '14 at 23:14

3 Answers3

2

Most of the things written about "Efficient Sequential Reading and Writing to Vectors" apply in your case.

Besides vector_bufwriter, which fills a vector using an imperative loop, there is also a variant of stxxl::stream::materialize() which does it in a functional programming style.

About previously knowing the vector's size: this is not really necessary for EM, since one can allocate blocks on the fly. These will then generally not be in order, but so be it, there is no guarantee on that anyway.

I see someone (me) made vector_bufwriter automatically double the vector's size if the filling reaches the vector's end. At the moment, I don't think this is necessary, maybe one should change this behaviour.

Timo Bingmann
  • 294
  • 1
  • 4
0

According to the documentation:

If one needs only to sequentially write elements to the vector in n/B I/Os the currently fastest method is stxxl::generate.

Does not really answer why push_back should be I/O-inefficient, though.

Nemo
  • 70,042
  • 10
  • 116
  • 153
0

One approach:

  • First reserve the number of elements you need. Resizing a vector with some types can be very time consuming. Appending many elements can result in several resizes as the vector grows.

  • Once resized, append using emplace_back (or simply push if the type is trivial, e.g. int).

Also review the member functions. An implementation which suits your needs well may already exist.

justin
  • 104,054
  • 14
  • 179
  • 226