14

I am using a std::deque for storing a large collection of items.
I know that deques is implemented as a list of vectors. The size of those vectors cannot be set but I wander what is the algorithm for choosing that size.

Community
  • 1
  • 1
cprogrammer
  • 5,503
  • 3
  • 36
  • 56
  • 7
    *Which* implementation? The STL and the C++ Standard Library specify interfaces, not implementation. – Lightness Races in Orbit Apr 20 '11 at 10:03
  • 2
    In the case of the GNU library (up to at least gcc 4.4.5), the size is 512 bytes, and there is no algorithm: "The '512' is tunable, but no investigation has been done since inheriting the SGI code." – Mike Seymour Apr 20 '11 at 10:03

2 Answers2

17

deque is implemented as a vector of vectors (a list of vectors would impede the constant time random access). The size of the secondary vectors is implementation dependent, a common algorithm is to use a constant size in bytes.

AProgrammer
  • 51,233
  • 8
  • 91
  • 143
  • 3
    how big is that constant size ? (in visual studio implementation for example) – cprogrammer Apr 20 '11 at 09:56
  • 4
    Original SGI implementation used 512. G++ is still using that according to Mike comment to your question. I'm not a reliable source about VC++ -- about everything I know about it is something said in public forums like this one and I don't remember someone mentioning that trivia. – AProgrammer Apr 20 '11 at 11:44
  • 3
    last time I delved in VC++ (2010) it was something silly like 8 or 16 bytes (or a single object if bigger). I then understood while `deque` performed so badly in VC++... – Matthieu M. Apr 20 '11 at 12:39
  • @Matthieu, now I'll have to remember :-( – AProgrammer Apr 20 '11 at 12:44
  • it performs bad. Now i have 9,5mil items (committed memory ~3gb) and proc is in 10-15%. I have 1 thread per cpu. The work consist in iterating the whole list and comparing some data with all other intems. For just 2-3mil items proc is in 100% – cprogrammer Apr 20 '11 at 14:06
  • 3
    How these vector of vectors guarantee constant time insertion at the beginning? – Calmarius Sep 09 '12 at 19:55
  • As usual when talking about complexity, it is important to know which operations are counted. And in the case of the containers, it it the operation on the contained type. If you count other operations, I think it is amortized constant time. – AProgrammer Sep 10 '12 at 05:18
  • 1
    @Calmarius: The amount of work necessary to do an insertion at the front depends solely on the first sub-vector's size, not on the total number of elements in the container. Insofar, it's constant in respect to `N`. If `N` is one billion, the first container still only has maybe 50 or 100 elements (or whichever the implementation chooses). If `N` is 5 trillion, it _does not_ take 5,000 times longer. Also, if the size of the first sub-vectoris kept "more or less constant" by splitting it as it grows too big, it's also "amortized constant" in respect of that sub-vector's `N`. – Damon Oct 21 '13 at 16:17
4

My deque implementation, the one from GNU which is derived from the HP/SGI version, is not a list of vectors; at least, not an std::list of std::vectors. The comments state

*  In previous HP/SGI versions of deque, there was an extra template
*  parameter so users could control the node size.  This extension turned
*  out to violate the C++ standard (it can be detected using template
*  template parameters), and it was removed.
*
*  Here's how a deque<Tp> manages memory.  Each deque has 4 members:
*
*  - Tp**        _M_map
*  - size_t      _M_map_size
*  - iterator    _M_start, _M_finish
*
*  map_size is at least 8.  %map is an array of map_size
*  pointers-to-"nodes".  (The name %map has nothing to do with the
*  std::map class, and "nodes" should not be confused with
*  std::list's usage of "node".)
Fred Foo
  • 355,277
  • 75
  • 744
  • 836
  • It resizes (as the standard requires it to), and it points to nodes rather than holding them inline - so it is not like an array of arrays at all - rather, it's something like a vector of pointer-to-array-allocations. – Karl Knechtel Apr 20 '11 at 10:21
  • 9
    @larsmans: it's necessary for the system headers to use reserved names for their implementation details, to avoid conflicts with any macros defined before including the header. – Mike Seymour Apr 20 '11 at 10:23
  • 2
    @MikeSeymour Of course, just because the names need to be uglified, doesn't mean they need to be meaningless or misleading too. I have the same complaint about the MS STL. `_Eep_Ds`? Seriously? – Sebastian Redl Dec 10 '14 at 12:40