1

I am trying to solve a DP problem where I create a 2D array, and fill the 2D array all the way. My function is called multiple times with different test cases. When I use a vector>, I get a time limit exceeded error (takes more than 2 sec for all the test cases). However, when I use bool [][], takes much less time (about 0.33 sec), and I get a pass.

Can someone please help me understand why would vector> be any less efficient than bool [][].

bool findSubsetSum(const vector<uint32_t> &input)
{
    uint32_t sum = 0;

    for (uint32_t i : input)
        sum += i;

    if ((sum % 2) == 1)
        return false;

    sum /= 2;

#if 1
    bool subsum[input.size()+1][sum+1];
    uint32_t m = input.size()+1;
    uint32_t n = sum+1;

    for (uint32_t i = 0; i < m; ++i)
        subsum[i][0] = true;

    for (uint32_t j = 1; j < n; ++j)
        subsum[0][j] = false;

    for (uint32_t i = 1; i < m; ++i) {
        for (uint32_t j = 1; j < n; ++j) {

            if (j < input[i-1])
                subsum[i][j] = subsum[i-1][j];
            else
                subsum[i][j] = subsum[i-1][j] || subsum[i-1][j - input[i-1]];
        }
    }

    return subsum[m-1][n-1];
#else
    vector<vector<bool>> subsum(input.size()+1, vector<bool>(sum+1));

    for (uint32_t i = 0; i < subsum.size(); ++i)
        subsum[i][0] = true;

    for (uint32_t j = 1; j < subsum[0].size(); ++j)
        subsum[0][j] = false;

    for (uint32_t i = 1; i < subsum.size(); ++i) {
        for (uint32_t j = 1; j < subsum[0].size(); ++j) {

            if (j < input[i-1])
                subsum[i][j] = subsum[i-1][j];
            else
                subsum[i][j] = subsum[i-1][j] || subsum[i-1][j - input[i-1]];
        }
    }

    return subsum.back().back();
#endif
}

Thank you, Ahmed.

Ahmed A
  • 3,362
  • 7
  • 39
  • 57
  • 2
    It is due to an increase in cache misses in the vector version. The nested vectors are not located close to each other in memory and you will get a miss on every iteration of your nested loop. For example `subsum[0][0]` is not going to fit in the same cache line as `subsum[1][0]`, so when you access them on subsequent iterations the processor has to read from memory (or a higher cache) each time. In the case of `bool[][]` the elements are packed together much more densely and so there will be less cache misses. – Paul Rooney Nov 01 '17 at 04:33
  • [this](http://futuretech.blinkenlights.nl/misc/cpumemory.pdf) article may be of interest if you wanted some in depth understanding, although it is fairly old itself now. – Paul Rooney Nov 01 '17 at 04:33
  • 1
    @paul: But `std::vector` is a special case, no? It might be more cache-friendly since it uses much less memory, but you have to balance that against the increased cost of accessing a single bit. – rici Nov 01 '17 at 04:42
  • @rici true but I was thinking that as each nested vector is separately allocated there would be higher likelihood of a cache miss penalty when accessing elements from different vectors on subsequent iterations. I guess we really need to profile to now for sure. – Paul Rooney Nov 01 '17 at 05:01
  • @Ahmed have you tried to profile both versions to see if you can see what is taking the time in the vector version? – Paul Rooney Nov 01 '17 at 05:03

3 Answers3

0

If you need a matrix and you need to do high performance stuff, it is not always the best solution to use a nested std::vector or std::array because these are not contiguous in memory. Non contiguous memory access results in higher cache misses.

See more :

std::vector and contiguous memory of multidimensional arrays

Is the data in nested std::arrays guaranteed to be contiguous?

On the other hand bool twoDAr[M][N] is guarenteed to be contiguous. It ensures less cache misses.

See more :

C / C++ MultiDimensional Array Internals

And to know about cache friendly codes:

What is “cache-friendly” code?

0

Can someone please help me understand why would vector> be any less efficient than bool [][].

A two-dimensional bool array is really just a big one-dimensional bool array of size M * N, with no gaps between the items.

A two-dimensional std::vector doesn't exist; is not just one big one-dimensional std::vector but a std::vector of std::vectors. The outer vector itself has no memory gaps, but there may well be gaps between the content areas of the individual vectors. It depends on how your compiler implements the very special std::vector<bool> class, but if your element count is sufficiently big, then dynamic allocation is unavoidable to prevent a stack overflow, and that alone implies pointers to separated memory areas.

And once you need to access data from separated memory areas, things become slower.

Here is a possible solution:

  1. Try to use a std::vector<bool> of size (input.size() + 1) * (sum + 1).
  2. If that fails to make things faster, avoid the template specialisation by using a std::vector<char> of size (input.size() + 1) * (sum + 1), and cast the elements to and from bool as required.
Christian Hackl
  • 27,051
  • 3
  • 32
  • 62
-1

In cases where you know the input size of the array from the beginning, using arrays will always be faster than Vector or equal. Because vector is a wrapper around array, therefore a higher-level implementation. It's benefit is allocating extra space for you when needed, which you don't need if you have a fixed size of elements.

If you had problem where you needed 1D arrays, the difference may not have bothered you(where you have a single vector, and a single array). But creating a 2D array, you also create many instances of the vector class, so the time difference between array and vector is multiplied by the number of elements you have in your container, making your code slow.

This time difference has many causes behind it, but the most obvious one is of course calling the vector constructor. You are calling a function subsum.size() times. The memory issue mentioned by other answers is another cause.

For performance, it is advised to use array's whenever you can in your code. Even if you need to use a vector, you should try to minimize the number of re-size's done by the vector(reserving, pre-allocating), achieving a closer implementation to array.

Rockybilly
  • 2,938
  • 1
  • 13
  • 38
  • I doubt that an array is "always" faster than a vector with an initial capacity defined, except perhaps of the one dynamic allocation required. And vector is not a wrapper around an array; it's a wrapper around raw memory allocated with placement new and internal constructor and explicit destructor calls. It just provides an interface similar to arrays due to its overloaded `operator[]`. – Christian Hackl Nov 01 '17 at 06:34
  • @ChristianHackl, In that case, I should say array is always faster or equal to vector. It shouldn't be slower if used properly. – Rockybilly Nov 01 '17 at 06:36
  • My point is that this is basically a red herring. One single allocation from the free store (and that's all vector needs if its initial capacity is set correctly and won't ever change) shouldn't even make a measurable performance difference. I also think that using arrays whenever one can is bad advice - what if the number of elements is considerably big? – Christian Hackl Nov 01 '17 at 06:40
  • @ChristianHackl That statement is a bit unclear. *Using array whenever you can* also means avoid using it when there are drawbacks. I also mentioned it is possible to increase vector performance with less reallocations, which you also mentioned. – Rockybilly Nov 01 '17 at 07:01