Cache friendly and faster way faster - `InvokeMe()`

Question

I am trying to figure out, if this really is the fastest approach. I want this to be as fast as possible, cache friendly, and serve a good time complexity.

DEMO: https://dotnetfiddle.net/BUGz8s

private static void InvokeMe() 
{
    int hz = horizontal.GetLength(0) * horizontal.GetLength(1);
    int vr = vertical.GetLength(0) * vertical.GetLength(1);
    int hzcol = horizontal.GetLength(1);
    int vrcol = vertical.GetLength(1);
    
    //Determine true from Horizontal information:
    for (int i = 0; i < hz; i++)
    {
        if(horizontal[i / hzcol, i % hzcol] == true)
           System.Console.WriteLine("True, on position: {0},{1}", i / hzcol, i % hzcol);
    }

    //Determine true position from vertical information:
    for (int i = 0; i < vr; i++)
    {
        if(vertical[i / vrcol, i % vrcol] == true)
           System.Console.WriteLine("True, on position: {0},{1}", i / vrcol, i % vrcol);
    }
}

Pages I read:

EDIT: The code example, is now, more towards what I am dealing with. It's about determining a true point x,y from a N*N Grid. The information available at disposal is: horizontal and vertical 2D arrays.

To NOT cause confusion. Imagine, that overtime, some positions in vertical or horizontal get set to True. This works currently perfectly well. All I am in for, is, the current approach of using one for-loop per 2D array like this, instead of using two for loops per 2D array.

In terms of complexity it does not matter if you are using one loop or two since you need to go through all elements you still will need to do `row * col` operations. — Guru Stron, Jan 06 '22 at 19:50
@AlbinM If you knew that the two were the same, then why did you say in your question that one is O(n) and one is O(n^2), when they're both O(rows*columns)? — Servy, Jan 06 '22 at 19:53
@Servy, 1. Looping over rows with outer loop, and looping over columns with inner loop is `O(n^2)`. Whereas, in this context, I am having one for-loop with a constant of size = row*column, and it will NEVER change, hence leading to **O(n)**. — Albin M, Jan 06 '22 at 20:00
When you calculate it as the way, I understand, is that, int size = row * column; // gets the size and is fixed. Like a constant size, therefore, it becomes, iterating from i some number to size - 1 till the end, meaning `O(N)` — Albin M, Jan 06 '22 at 20:04
@AlbinM You seem to think a loop is just always O(n), which is just a false statement. The number of iterations of both loops is *exactly* the number of rows times the number of columns (using big O isn't even needed to simplify this expression). You could notate that as O(n^2) if you know that the rows and columns are the same in either case and "n" is that value, or you could call both of them O(n) if you define "n" as the number of cells in the matrix (which is rows x columns), but that's not typical. So any of the three notations could be valid, depending on how you define your terms. — Servy, Jan 06 '22 at 20:04
I mean if you take that argument to it's conclusion (as you're claiming it's a constant) then it's actually O(1). Can't cheat the number of ops that way. — ChiefTwoPencils, Jan 06 '22 at 20:05
@AlbinM note that if `n=10` and `m=10`, `n^2` is the same as `n*m`. — Vlad Feinstein, Jan 06 '22 at 20:05
@VladFeinstein If you fix n and m to 10 then the whole thing is O(1). You could unroll the loop even if the size is fixed. (Which, for the record, is potentially actually worth doing in certain applications, but big O is irrelevant to such discussions.) — Servy, Jan 06 '22 at 20:06
@Servy, can you please expand on this, *" If you fix n and m to 10 then the whole thing is O(1). "*, thanks! — Albin M, Jan 06 '22 at 20:09
@AlbinM The whole premise of big O is to determine the number of operations as the relevant variables approach infinity. You have no variables at all in determining the number of operations performed, if you fix them to constant values. — Servy, Jan 06 '22 at 20:13
@Servy, "Eg: for(i = 0; i<(X*Y); i++) elements, then it will be O(n) cause you are not restarting your iteration at any point of time. Hope this makes sense." from: https://stackoverflow.com/questions/10344834/time-complexity-of-a-nested-for-loop-that-parses-a-matrix — Albin M, Jan 06 '22 at 20:19
@AlbinM If you define `n=X*Y`, then yes, that's true. If you define `n = X = Y` (which would be the more common notation, and so what would be what people would expect if you don't explicitly define your terms), then it's O(n^2). That's what it is because that's how many iterations of the loop that there are. That the loop isn't restarted is a true statement, but if you use the standard notation of n being equal to rows and also being equal to columns, then you are not executing n iterations of your loop. Note that if you define "n=X*Y" then the nested loops are also O(n). — Servy, Jan 06 '22 at 20:23
@AlbinM again. Time complexity for both cases (single loop and nested ones) is the same - `O(n)` where `n = row * col`. Yes, you are "restarting" nested loop, but it does not go for `n` operations per iteration it goes for `col` and there are `row` iterations for outer loop. Try "manually" simulating both cases going through all iterations for some different small values of `row` and `col` (not always a good approach - but suitable in this case). — Guru Stron, Jan 06 '22 at 20:40
@AlbinM assuming _"from a N*N Grid"_ then _"The time complexity I get for the second loop alone is O(n)"_ statement is incorrect, cause your single loop will be O(N^2) (cause N*N is N^2 ). For example if you take `hz` it will be n^2 (where n is matrix row/column count) so to complete the loop from 0 to n^2 you will need to execute it's body n^2 times making it's complexity O(n^2). — Guru Stron, Jan 06 '22 at 21:08
@GuruStron, thank you very much. But, can this be improved? Or, is this the fastest possible? — Albin M, Jan 06 '22 at 21:15
@AlbinM in terms of time complexity - in general case - no, if you need to scan all elements - you need to scan all elements. Other possible optimizations can depend on the actual data and what is done with it - if it is updated rarely (compared to searches) - you can cache the search result until the next update. In terms of "constant" performance - use benchmarkdotnet to determine the fastest approach to iterate through data - use different "characteristic" matrix sizes, and benchmark on concrete hardware and software (i.e. framework version and runtime). — Guru Stron, Jan 06 '22 at 21:27
@GuruStron, So, it is safe to stick to double for-loops, meaning, for for if horizontal, and for for if vertical, leading to O(N^2) + O(N^2) which seems costly and worst case scenario of choice. — Albin M, Jan 06 '22 at 21:34
@AlbinM yes. Again - you should benchmark concrete software/hardware/matrix size if you can limit those cause there can be some variance there resulting in difference performance. Also you can try looking into [vectorization](https://learn.microsoft.com/en-us/dotnet/standard/simd) to speed up the processing or parallelization. — Guru Stron, Jan 06 '22 at 21:37
@GuruStron, thx. But, I am not using SIMD here. I usually, use that in C++. But, that's different from this context. Anyway. I will have to do some tests. — Albin M, Jan 06 '22 at 21:47
@AlbinM If the dimensions of the 2d array are fixed and known, you can flatten it into 1d array. — Sohaib Jundi, Jan 06 '22 at 21:55

score 1 · Answer 1 · answered Jan 06 '22 at 19:51

1

Did you consider

for (int i = 0; i < row; i++)
{
    for (int j = 0; j < col; j++)
    {
        Console.Write(string.Format("{0:00} ", matrix[i, j]));
        Console.Write(Environment.NewLine + Environment.NewLine);
    }
}

It is basically the same loop as yours, but without / and % that compiler may or may not optimize.

answered Jan 06 '22 at 19:51

Vlad Feinstein

10,960
1
12
27

Yes, but, that's two loops. One more loop, is costlier than, doing it all in a single for-loop. – Albin M Jan 06 '22 at 19:52
3

@AlbinM what makes you think that? – Vlad Feinstein Jan 06 '22 at 19:53
Vlad, the following honesty: (1) for loop is outer loop, (2) for loop is inner loop. When we enter the inner loop, we'll have to wait for it to iterate j to N - 1, and then return and continue from outer loop, and once again, wait for second loop etc. You get the idea. – Albin M Jan 06 '22 at 19:55
I added the links that I visited. It is also mentioned there. – Albin M Jan 06 '22 at 19:56
My initial code was implemented with two **for**-loops, so I modified it. Now, all I want to know is: is this faster, can it be optimized further, and cache friendliness. – Albin M Jan 06 '22 at 19:58
3

@AlbinM your single loop does `row * col` iterations, this double loop does `col` iterations `row` times, which is also `row * col` iterations in total. The latter can be a bit better since the is no `/` or `%` operations. In terms of cache friendliness, they are basically the same since the elements will accessed in the same order in both. – Sohaib Jundi Jan 06 '22 at 20:01
@SohaibJundi, see my EDIT. – Albin M Jan 06 '22 at 20:51
@AlbinM _"from a N*N Grid"_ your single loop is O(N^2). – Guru Stron Jan 06 '22 at 21:03

Guru Stron · Accepted Answer · 2022-01-06T21:32:05.713

Time complexity for approach with one loop and nested loops is the same - O(row * col) (which is O(n^2) for row == col as in your example for both cases) so the difference in the execution time will come from the constants for operations (since the direction of traversing should be the same). You can use BenchmarkDotNet to measure those. Next benchmark:

[SimpleJob]
public class Loops
{
    int[, ] matrix = new int[10, 10];
    [Benchmark]
    public void NestedLoops()
    {
        int row = matrix.GetLength(0);
        int col = matrix.GetLength(1);
        for (int i = 0; i < row ; i++)
        for (int j = 0; j < col ; j++)
        {
            matrix[i, j] = i * row + j + 1;
        }
    }
    
    [Benchmark]
    public void SingleLoop()
    {
        int row = matrix.GetLength(0);
        int col = matrix.GetLength(1);
        var l = row * col; 
        for (int i = 0; i < l; i++)
        {
            matrix[i / col, i % col] = i + 1;
        }
    }
}

Gives on my machine:

Method	Mean	Error	StdDev	Median
NestedLoops	144.5 ns	2.94 ns	4.58 ns	144.7 ns
SingleLoop	578.2 ns	11.37 ns	25.42 ns	568.6 ns

Making single loop actually slower.

If you will change loop body to some "dummy" operation - for example incrementing some outer variable or updating fixed (for example first) element of the martix you will see that performance of both loops is roughly the same.

Execution of a single input value tells you nothing about the big O value of the algorithm. If you wanted to approximate it experimentally you'd need to graph the execution time over a wide range of input sizes, and then find a best fit curve (and simplify it as the input size goes to infinity if you want big O). — Servy, Jan 06 '22 at 20:27
@Servy yes, of cause, this bench is to access the constants since the big O is the same for both of them. — Guru Stron, Jan 06 '22 at 20:29
@GuruStron, so based on what you wrote to Servy, this measurement is unreliable. — Albin M, Jan 06 '22 at 20:57
@AlbinM it depends on what do you mean by "unreliable" and what do you think it measures. Also you can change benchmark to support multiple matrix sizes. — Guru Stron, Jan 06 '22 at 21:09

Cache friendly and faster way faster - `InvokeMe()`

2 Answers2