What is the most optimized way to assign a value to a two dimensional array?

Question

I want to know what is the most optimal way to assign a value to a two-dimensional array?.

Say, I want to assign value 12 to the below array.

for (int i = 0; i < m; i+++)
    for (int j = 0; j < n; j++)
        array[i][j] = 10

Is there any better way to do this without for loop?.

That's not assigning *a* value; it's assigning `m*n` values. — WhozCraig, Feb 07 '21 at 09:48
`I want to assign value 12` well, then assign `12`, not `10`. `What is the most optimized way` The "most optimized way" would be to write in assembly (if optimizing for speed). So what measurement are you using for determining what is "the most optimized way" and what is "better"? — KamilCuk, Feb 07 '21 at 09:49
How is `array` declared? You may be able to do a single loop instead of 2. — Marc Glisse, Feb 07 '21 at 10:04

chqrlie · Accepted Answer · 2021-02-07T12:00:03.500

Your code is fine, readable and simple, except for the typos. Just make sure you use the same type for i and j as that of n and m.

#define N 10
#define M 10
    int array[M][N];
    for (int i = 0; i < M; i++) {
        for (int j = 0; j < N; j++)
            array[i][j] = 0;
    }

The advantage of writing the code this way is it works for both a flat 2D array int array[M][N] and an array of arrays int *array[M] as well as a pointer to an array of arrays int **array.

If the array is a flat 2D array of integers and the value is 0, there is an alternative:

memset(array, 0, sizeof(array[0][0]) * M * N);

But it only works for very specific values that are represented in memory with identical bytes in all positions. Such is the case for 0 and -1 for regular two's complement architectures and 254 other less obvious values (out of 4 billion).

You could also intialize a single row and copy that to the following ones...

Such optimisations are confusing, error prone and unnecessary as modern optimizing compilers will generate very efficient code for this:

#define M 10
#define N 10
void clear_matrix(int array[M][N]) {
    for (int i = 0; i < M; i++) {
        for (int j = 0; j < N; j++)
            array[i][j] = 0;
    }
}

Notes:

Code correctness always beats optimisation: what is the purpose of getting incorrect results fast?
Always benchmark and profile before you optimize, and focus first on the bottlenecks that stand out as obvious targets.
It is unlikely for the matrix intialization to be the bottleneck. If is it, your algorithm is probably at fault.
Finding a better algorithm, with reduced complexity, is a more productive effort than micro-optimisation.

`memset(array, 0, …);` is not a good alternative to `… array[i][j] = 10`. I heard somewhere that “Code correctness always beats optimisation.” — Eric Postpischil, Feb 07 '21 at 11:57

score 0 · Answer 2 · answered Feb 07 '21 at 11:00

This question is almost impossible to answer. Simple test:

#define ROWS 1000
#define COLS 4000

int arr[ROWS][COLS];

void __attribute__((noinline)) set(int val)
{
    for(size_t col = 0; col < COLS; col++)
    {
        arr[0][col] = val;
    }
    for(size_t row = 1; row < ROWS; row++)
    {
        memcpy(arr[row], arr[0], sizeof(arr[0]));
    }
}

void __attribute__((noinline)) set1(int val)
{
    static int arr1[COLS];
    for(size_t col = 0; col < COLS; col++)
    {
        arr1[col] = val;
    }
    for(size_t row = 0; row < ROWS; row++)
    {
        memcpy(arr[row], arr1, sizeof(arr1));
    }
}


void __attribute__((noinline)) set2(int val)
{
    for(size_t row = 0; row < ROWS; row++)
        for(size_t col = 0; col < COLS; col++)
        {
            arr[row][col] = val;
        }
}


int main(void)
{
    clock_t begin, end;
    double time_spent;

    begin = clock();
    for(int x = 0; x < 1000; x++)
    {
        set(x);
    }

    end = clock();
    time_spent = (double)(end - begin) / CLOCKS_PER_SEC;

    printf("%f\n", time_spent);

    begin = clock();

    for(int x = 0; x < 1000; x++)
    {
        set1(x);
    }
    end = clock();
    time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
    printf("%f\n", time_spent);


    begin = clock();

    for(int x = 0; x < 1000; x++)
    {
        set2(x);
    }
    end = clock();
    time_spent = (double)(end - begin) / CLOCKS_PER_SEC;
    printf("%f\n", time_spent);

}

Godbolt shows that the loop method is fastest: https://godbolt.org/z/K9Gcbs

1.162777
1.009047
0.973361

But by ubuntu 20.4 64 bits test (same compiler version, same compiler options) gives right opposite results.

0.542433
0.502667
0.936434

So the conclusion is: Always benchmark. Do not trust opinions and myths.

Each time I refresh on Godbolt, I get different rankings. I suspect your microbenchmark is poorly designed. — Cody Gray - on strike, Feb 07 '21 at 11:07
Fully agree that benchmarking is a must, and premature optimisation is evil. I would suggest delaying the `printf` statements until after all measurement are done and using separate smaller arrays to minimize side effects due to cacheing and CPU load. — chqrlie, Feb 07 '21 at 11:13
@CodyGray but they are consistent - **always** simple loop is faster than other. And this is the point. godbolt - loop always faster. Ubuntu - always slower comparing to memcpy methods. — 0___________, Feb 07 '21 at 11:14
They *aren't* consistent, that's why I said "different rankings", not "different absolute times". — Cody Gray - on strike, Feb 07 '21 at 11:15
@CodyGray for they are https://pastebin.com/tMW5h92b. Third is always the lowest. BTW sometimes you may have abnormal results as godbolt server may have different load during execution of the particular test (or rather part of the test). I had one not consistent in 20 reloads. — 0___________, Feb 07 '21 at 11:21

What is the most optimized way to assign a value to a two dimensional array?

2 Answers2