0

Im trying to analyze algorithm and I would like to improve some parts. Now I discover (in Intel VTUnes) that one function have back-end bound. Body of this function is:

int i, j;

// copy original population to temporary area
for (i = 0; i < population_size; i++)
{
    for (j = 0; j < problem_size; j++)
    {
        ffa_tmp[i][j] = ffa[i][j];
    }
}

// generational selection in sense of EA
for (i = 0; i < population_size; i++)
{
    for (j = 0; j < problem_size; j++)
    {
        ffa[i][j] = ffa_tmp[Index[i]][j];
    }
}

ffa and ffa_tmp are dynamically allocated 2D arrays of doubles. Index is array of integeres. It is used for ordering. The worst part by analyzer is ffa[i][j] = ffa_tmp[Index[i]][j];

If I understand correctly, I could improve this with vectorization. Or there is another solution?

Sk1X1
  • 1,305
  • 5
  • 22
  • 50

2 Answers2

0

You could improve this code using memcpy.

Aryan
  • 2,675
  • 5
  • 24
  • 33
0

Vectorization is definitely a way to go, but with a C++ twist. dynamically allocated 2D arrays aren't 2D arrays. They are an array of pointers to arrays. They can also be brutally and painfully slow due to all the jumping around through memory the CPU probably has to do. This is almost certainly part of what the tuner is warning you about. If you need to speed this up, first see if you can use a contiguous data store to improve the caching and reduce the pointer-chasing. Here's a simple one.

Next comes fa[i][j] = ffa_tmp[Index[i]][j]];. Depending on how Index[i]][j] is arranged, the CPU may be having to jump back and forth through ffa_tmp looking for the indexed entry. Modern processors are at their absolute best when data access if contiguous and predictable (See Why is processing a sorted array faster than processing an unsorted array?). There isn't much you can do to improve the cache friendliness of that. If the first suggestion doesn't give you the performance improvement needed, see if you can redesign to avoid having to do this indexing entirely.

user4581301
  • 33,082
  • 7
  • 33
  • 54
  • Thanks for reply. This looks good. But I wonder, If I use Matrix class from the link.. how can I create function which will return part of array? So i could use somethink like `matrix(i)` and I get `*double` for `i` – Sk1X1 Jan 31 '20 at 19:05
  • If you're looking for a row you could `double* Matrix::operator()(size_t i) { return &mData[i * mCols]; }` and access like a normal array. But I think I'd be more inclined to make `const` and non-`const``slice` functions that do pretty much the same thing or returns a `row_view` class depending on just how much data protection I need. – user4581301 Jan 31 '20 at 19:22