8

I've read that a vector-of-vectors is bad given a fixed 2nd dimension, but I cannot find a clear explanation of the problems on http://www.stackoverflow.com.

Could someone give an explanation of why using 2D indexing on a single vector is preferable to using a vector-of-vectors for a fixed 2nd dimension?

Also, I'm assuming that a vector-of-vectors is the preferable data structure for a 2D array with a variable 2nd dimension? If there is any proof to the contrary I'd love to see that.

Jonathan Mee
  • 37,899
  • 23
  • 129
  • 288
  • 3
    Only a profiler can give a serious answer to this question. The internet is plenty of unsupported claims; remember the principle: make it work first; make it fast later. And in general, what makes a program fast? Good algorithms, asyncronous, locality, tricks. In this order; changing a vector of vectors to a single vector looks like the third/forth step. – Paolo.Bolzoni Jul 07 '16 at 11:40
  • 4
    The most usual reason given is that the elements in a `vector >` are not contiguous, and setting up such a vector requires more work (e.g. multiple allocations). However, those arguments are subjective opinions, not absolute statements of fact. Which approach is better depends on how behaviour of the code is measured/profiled/etc. – Peter Jul 07 '16 at 11:40
  • 1
    Also see [this](https://isocpp.org/wiki/faq/operator-overloading#matrix-array-of-array) ISO CPP FAQ. – NathanOliver Jul 07 '16 at 11:43
  • @Whoever reopened this, why? – Baum mit Augen Jul 07 '16 at 11:44
  • @LightnessRacesinOrbit What was wrong with the dupe? – NathanOliver Jul 07 '16 at 11:45
  • @BaummitAugen: I did, because _it's not a duplicate_. The two questions aren't even talking about the same container. Furthermore, "faster" and "better" are not the same thing. Please do not abuse your dupehammer. – Lightness Races in Orbit Jul 07 '16 at 11:47
  • 1
    @LightnessRacesinOrbit The answer has in it *The long answer, or why dynamic 2 dimensional data storage (pointer-to-pointer or vector-of-vector) is "bad" for simple / small matrices.* and goes on to detail both. How is that not talking about the same thing? – NathanOliver Jul 07 '16 at 11:48
  • 3
    @NathanOliver: "Duplicate" means "the question is an exact duplicate". Not "the question is vaguely related and some answers may apply to both". Feel free to cross-reference to that page in your answers to _this_ question. – Lightness Races in Orbit Jul 07 '16 at 11:48
  • I disagree. The question ask what is the issue with a 2d vector. This issue is that the second dimension is not guaranteed to be contiguous. All of the answers here say that and so does the dupe answer. How is that not the same? – NathanOliver Jul 07 '16 at 11:51
  • @Nathan Look if you disagree that's fine. Vote to close again. I don't care! But I was asked what _my_ opinion is and why I reopened, and I have provided that reason. As far as I'm concerned, there's nothing further to discuss. We've all cast our votes into the pot innit; the system is working. – Lightness Races in Orbit Jul 07 '16 at 12:00
  • In my experience in Stackoverflow "duplicate" means "it shares some common keywords or conjunctions."Besides, honesly I have to insist that only the profiler can give you the answer. A simple example: in the program the value of the matrix are visited in order "increase rows index first, column second." In this case it does not matter if a vector of vectors of a single vector, the perfomance will be bad as locality will always fail. – Paolo.Bolzoni Jul 26 '16 at 06:14

3 Answers3

10

For a std::vector the underlying array is dynamically allocated from the heap. If you have e.g. std::vector<std::vector<double>>, then your outer vector would look like

{v1, v2, v3, v4, ... vn}

This looks like each of the inner vectors will be in contiguous memory, and they will, but their underlying arrays will not be contiguous. See the diagram of the memory layout in this post. In other words the you cannot say that

&(v1.back()) + 1 == &(v2.front()) // not necessarily true!

Instead if you used a single vector with striding then you would gain data locality, and it would inherently be more cache friendly as all your data is contiguous.

For the sake of completeness, I would use neither of these methods if your matrix was sparse, as there are more elegant and efficient storage schemes than straight 1D or 2D arrays. Though since you mentioned you have a "fixed 2nd dimension" I will assume that is not the case here.

Community
  • 1
  • 1
Cory Kramer
  • 114,268
  • 16
  • 167
  • 218
7

I shall answer with a simple analogy.

What is "better" in general out of the two things?

  1. A telephone book where each entry is a code referring to a different book that you have to find and read to discover someone's telephone number
  2. A telephone book that lists people's telephone numbers

Keeping all your data in a single big blob is more simple, more sensible, and easier on your computer's cache. A vector with N vectors inside it is much more operationally complex (remember, each of those requires a dynamic allocation and size management operations!); one vector is, well, one vector. You haven't multiplied the workload by N.

The only downside really is that to simulate 2D array access with a 1D underlying data store, you need to write a facade. Fortunately, this is very easy.

Now for the subjective part: on balance I'd say that it's worthwhile unless you're really in a rush and your code quality doesn't particularly matter.

Lightness Races in Orbit
  • 378,754
  • 76
  • 643
  • 1,055
3

Using a vector of vectors:

  1. Is inefficient in terms of memory allocation, due to multiple blocks being allocated.
  2. Models a jagged right hand edge, so bugs can creep in.

Using a single vector is, in general, better as the memory management is simpler. But you can encounter problems if your matrix is large as it can be difficult to acquire a large contiguous block.

If your array is resizeable, then I'd still stick to a single vector: the resize complexity can be isolated in a single function that you can optimise.

The best solution of all is, of course, to use something like the linear algebra library (BLAS), available in Boost. That also handles large sparse matrices beautifully.

Bathsheba
  • 231,907
  • 34
  • 361
  • 483