1

I wrote a cellular automaton program that stores data in a matrix (an array of arrays). For a 300*200 matrix I can achieve 60 or more iterations per second using static memory allocation (e.g. std::array).

I would like to produce matrices of different sizes without recompiling the program every time, i.e. the user enters a size and then the simulation for that matrix size begins. However, if I use dynamic memory allocation, (e.g. std::vector), the simulation drops to ~2 iterations per second. How can I solve this problem? One option I've resorted to is to pre-allocate a static array larger than what I anticipate the user will select (e.g. 2000*2000), but this seems wasteful and still limits user choice to some degree.

I'm wondering if I can either

a) allocate memory once and then somehow "freeze" it for ordinary static array performance?

b) or perform more efficient operations on the std::vector? For reference, I am only performing matrix[x][y] == 1 and matrix[x][y] = 1 operations on the matrix.

According to this question/answer, there is no difference in performance between std::vector or using pointers.

EDIT:

I've rewritten the matrix, as per UmNyobe' suggestion, to be a single array, accessed via matrix[y*size_x + x]. Using dynamic memory allocation (sized once at launch), I double the performance to 5 iterations per second.

As per PaulMcKenzie's comment, I compiled a release build and got the performance I was looking for (60 or more iterations per second). However, this is the foundation for more, so I still want to quantify the benefit of one method over the other more thoroughly, so I used a std::chrono::high_resolution_clock to time each iteration, and found that the performance difference between dynamic and static arrays (after using a single array matrix representation) to be within the margin of error (450~600 microseconds per iteration).

The performance during debugging is a slight concern however, so I think I'll keep both, and switch to a static array when debugging.

SkyNT
  • 718
  • 1
  • 8
  • 19
  • It seems that those differences in performance are related to compiler optimisations (did you turn them on?). Please provide your code if you require precise answer. – mkk Apr 19 '16 at 23:17
  • 1
    Are you using a std::vector>? – user253751 Apr 19 '16 at 23:25
  • Have you tried *constructing* the vector with a [starting size](http://en.cppreference.com/w/cpp/container/vector/vector)? You should read up on some of the constructors. – Thomas Matthews Apr 19 '16 at 23:42
  • 1
    For high performance, your rows should be able to fit inside your processor's data cache line. More than one row per cache line also works. – Thomas Matthews Apr 19 '16 at 23:43
  • std::vector of std::vector has poor locality as the data is not contiguous, leading to a higher likelihood of cache misses versus the contiguous std::array. You might want to consider forcing contiguity by using a 1D std::vector and performing the indexing (row * num_columns + column) yourself. The extra math is comparable to what is done behind the scenes so you lose nothing. std::vector will also initialize its contents, but that should count for at most a few percent in lost performance. good hints here: https://isocpp.org/wiki/faq/operator-overloading#matrix-subscript-op – user4581301 Apr 20 '16 at 00:14
  • 1
    @OP: You did not indicate whether you're timing with optimizations turned on. If you're timing an unoptimized build, you're wasting your time with these observations. You should be timing optimized builds. – PaulMcKenzie Apr 20 '16 at 00:24

1 Answers1

2

For reference, I am only performing

matrix[x][y]
  • Red flag! Are you using vector<vector<int>>for your matrix representation? This is a mistake, as rows of your matrix will be far apart in memory. You should use a single vector of size rows x cols and use matrix[y * row + x]
  • Furthermore, you should follow the approach where you index first by rows and then by columns, ie matrix[y][x] rather than matrix[x][y]. Your algorithm should also process the same way. This is due to the fact that with matrix[y][x] (x, y) and (x + 1, y) are one memory block from each other while with any other mechanism elements (x,y) and (x + 1, y), (x, y + 1) are much farther away.

Even if there is performance decrease from std::array to std::vector (as the array can have its elements on the stack, which is faster), a decent algorithm will perform on the same magnitude using both collections.

UmNyobe
  • 22,539
  • 9
  • 61
  • 90