0

I have a huge 3d vector where I store double values. In order to increase the performance I wanted to reserve the number of elements in advance as I know them before processing;however, I couldn't really figure it out how reserve&clear&erase work in this case.

I have implemented a small program which has 2d vector in this case, pls see the code snippet below:`

for(int counter = 0; counter < 2; counter++){
    cout << "Counter-> " << counter << endl;

    vector<vector<double> > vec2D;
    vec2D.reserve(2);

    // assign values to the vec
    for(int i = 0; i < 2; i++){
        vec2D[i].reserve(5);
        for(int j = 0; j < 5; j++){
            vec2D[i][j] = j;
        }
    }

    // print the vector content
    for(int i = 0; i < 2; i++){
        for(int j = 0; j < 5; j++){
            cout << vec2D[i][j] << "\t";
        }
        vec2D[i].clear();
        cout << endl;
    }
    vec2D.clear();
}

When I run this code snippet it iterates thorugh the for loop just for once where it should do it twice; however, when I declare the vector outside of the for loop it does iterate twice. The output of the above snippet is:

Counter-> 0
0   1   2   3   4   
0   1   2   3   4   
Counter-> 1

Could you please make it clear how it actually should be in this case and how it works.

Thanks in advance.

Cory Kramer
  • 114,268
  • 16
  • 167
  • 218
serhatg
  • 312
  • 1
  • 3
  • 15
  • 1
    Sidenote: Have a look at http://stackoverflow.com/questions/17259877/1d-or-2d-array-whats-faster on why you presumably do not want to use `vector>` or `vector>>` when performance is an issue. – Pixelchemist Sep 18 '14 at 13:14
  • You know that there is Boost Multidim? http://www.boost.org/doc/libs/1_55_0/libs/multi_array/doc/user.html – datenwolf Sep 18 '14 at 13:31
  • in my case boost multidim array were slower than std::vector. – serhatg Sep 18 '14 at 13:34

2 Answers2

4

You have a big problem in your code: reserve() does not change the size of the vector, only its storage capacity (to avoid system calls in the future).

What is needed in your case instead of reserve() is assign() or resize().

ypnos
  • 50,202
  • 14
  • 95
  • 141
  • Thank you for the answer. In current working code I am using resize. Which looks like this: vector > vec(5, vector(8)); And then assign some values to this vector(in real case it's a 3d vector with dims 100kX5X8. In order to increase the performance I started playing around with reserve method. – serhatg Sep 18 '14 at 13:21
1

If you want any semblance of performance don't use vectors of vectors: this is terribly inefficient. This is probably the most efficient memory utilization, and faster access than vectors, although there are faster solution with some memory trade-offs:

class GridWithSmallDimensions
{
    double * values;
    int sizeX, sizeY, sizeZ;
public:
    GridWithSmallDimensions(int x, int y, int z) 
    : sizeX(x), sizeY(y), sizeZ(z)
    {
        values = new double [x*y*z];
    }
    ~ GridWithSmallDimensions()
    {
        delete [] values;
    }
    double get(int x, int y, int z)
    {
        return value[x+sizeX*(y+sizeY*z)];
    }
    void set(int x, int y, int z, double v)
    {
        value[x+sizeX*(y+sizeY)] = v;
    }
};

Here's a slightly more involved implementation, aimed at reduction of index calculations:

class GridWithRowPointers
{
    int sizeX, sizeY, sizeZ; // Dimensions
    double * values; // Data
    double ** rows; // Pointers to "rows" of data indexed by x

    inline int getIndex(y, z) const // Get index of data within a row
    {
        return sizeZ * y + z;
    }

public:
    GridWithRowPointers(int x, int y, int z) 
    : sizeX(x), sizeY(y), sizeZ(z)
    {
        values = new double [x*y*z]; // Allocate data
        rows = new (double*) [x]; // Allocate row pointers
        for( int n = 0; n < sizeX; n++ )
        {
            rows[n] = values + sizeY * sizeZ;
        }
    }

    ~ GridWithRowPointers()
    {
        delete [] values;
        delete [] rows;
    }

    inline double get(int x, int y, int z) const
    {
        // Access value via row pointer
        return rows[ x ][ getIndex(y,z) ];
    }

    inline void set(int x, int y, int z, double v)
    {
        // Access value via row pointer
        rows[ x ][ getIndex(y,z) ] = v;
    }
};
Michael
  • 5,775
  • 2
  • 34
  • 53
  • That's really a smart way of doing. Really liked. Thanks a lot! There is a performance boost of %8 by just changing 2 vector structures in my code into this structure. – serhatg Sep 18 '14 at 14:11
  • By profiling my code with GNU compiler by turning on -pg flag on, I can see most of the time is being spend for the get method of GridWithSmallDimensions class. Is it because of the calculations for the indices ? Is there a way of minimizing this ? I do a lot of calculation with the values stored in those vectors where as each vector may hold thousands of elements. Assuming that my program takes 2 mins in total, 26 second of it is spent for get() method. P.S. The method gave me a memory optimization of %25 in my case. Memory usage is important as much as the speed in my case. Thank you. – serhatg Sep 19 '14 at 11:42
  • @serhatg: two comments. First, cache locality may be involved. This means that time is wasted because values are accessed not in the order they are allocated. For example, if you have for(x=...) for(y=...) for(z=...) { do something } kind of loop you want to index into the values array in such manner so that the values with the same x & y and slightly different z are stored close to each other. This is just a manner of indexing, x+sizeX*(y+sizeY*z) versus z+sizeZ*(y+sizeY*x). Second, a popular technique is to pre-allocate pointers to "rows" of data to save on multiplication, as I just added. – Michael Sep 19 '14 at 16:12
  • Thanks a lot for the recommendations @Michael – serhatg Sep 24 '14 at 09:00