Macros for 3D loops in C

Question

I'm developing a C (C99) program that loops heavily over 3-D arrays in many places. So naturally, the following access pattern is ubiquitous in the code:

for (int i=0; i<i_size, i++) {
    for (int j=0; j<j_size, j++) {
        for (int k=0; k<k_size, k++) {
            ...
        }
    }
}

Naturally, this fills many lines of code with clutter and requires extensive copypasting. So I was wondering whether it would make sense to use macros to make it more compact, like this:

#define BEGIN_LOOP_3D(i,j,k,i_size,j_size,k_size) \
for (int i=0; i<(i_size), i++) { \
    for (int j=0; j<(j_size), j++) { \
        for (int k=0; k<(k_size), k++) {

and

#define END_LOOP_3D }}}

On one hand, from a DRY principle standpoint, this seems great: it makes the code a lot more compact, and allows you to indent the contents of the loop by just one block instead of three. On the other hand, the practice of introducing new language constructs seems hideously ugly and, even though I can't think of any obvious problems with it right now, seems alarmingly prone to creating bugs that are a nightmare to debug.

So what do you think: do the compactness and reduced repetition justify this despite the ugliness and the potential drawbacks?

Create a function which does the looping, and as arguments takes the "array" to loop over and a pointer to a function to call with each item. — Some programmer dude, Aug 05 '14 at 05:15
@JoachimPileborg Not bad from a coding style standpoint, but it will be a performance hit, especially if the compiler would otherwise be able to vectorize the inner loop. — Jsl, Aug 05 '14 at 05:19
Macros are really hard to diagnostics, because you don't see the problem, the code would be generated during preprocessing (yes there is -E, but it's a mess), some times allow you to do incredibles things, but in the day to day, try to avoid at all cost, other problem is that macros are global, you declare a macro in one file and would populating all the files witch include this. Ask for `min` and `max` in `windows.h`. Pay special careful with `{` and `}` in macros, the errors would be interesting. — NetVipeC, Aug 05 '14 at 05:24
What are typical values for `i_size`, `j_size` and `k_size`? I'm trying to get a handle on how big these 3D arrays are. — Emmet, Aug 05 '14 at 06:47
@Emmet it doesn't really matter for this question, but I expect they will be in the range of 64 to 2048 or so. — Jsl, Aug 05 '14 at 07:29
@Jsl: sure it matters. 4x4x4 `double`s is 512B, but 2048^3 `double`s is 64GiB. On my laptop, one is a problem that's small enough to fit in L1$ many times over, and the other will be more than half on disk. The size determines whether cache behavior is an important enough consideration to contemplate loop optimizations that aren't accommodated by the proposed macro. — Emmet, Aug 05 '14 at 08:32

Jens Gustedt · Answer 1 · 2014-08-05T09:31:46.273

1

Never put open or close {} inside macros. C programmers are not used to this so code gets difficult to read.

In your case this is even completely superfluous, you just don't need them. If you do such a thing do

FOR3D(I, J, K, ISIZE, JSIZE, KSIZE)      \
for (size_t I=0; I<ISIIZE, I++)          \
    for (size_t J=0; J<JSIZE, J++)       \
        for (size_t K=0; K<KSIZE, K++)

no need for a terminating macro. The programmer can place the {} directly.

Also, above I have used size_t as the correct type in C for loop indices. 3D matrices easily get large, int arithmetic overflows when you don't think of it.

edited Aug 05 '14 at 09:31

answered Aug 05 '14 at 05:54

Jens Gustedt

76,821
6
102
177

size_t isn't really modern, it is as old as C and has always been the most correct type for loop iterators. Programmers are simply far less sloppy nowadays, compared to how they were in the dark ages of K&R. So therefore we confuse modern programming with good programming :) – Lundin Aug 05 '14 at 06:50
Fair enough, I was aware of this. In fact, I'm using ptrdiff_t for this purpose in the actual code. I was just trying to go for maximum simplicity in the example :) – Jsl Aug 05 '14 at 07:27
I've tried several times over the years to indulge my inner pedant and use `size_t` (or another unsigned integer type) for matrix indices and I've regretted it every single time. – Emmet Aug 05 '14 at 07:28
@Lundin, right, so I deleted the word "modern", see my edit :) – Jens Gustedt Aug 05 '14 at 09:32

Lundin · Answer 2 · 2014-08-05T10:52:49.660

The best way is to use a function. Let the compiler worry about performance and optimization, though if you are concerned you can always declare functions as inline.

Here's a simple example:

#include <stdio.h>
#include <stdint.h>

typedef void(*func_t)(int* item_ptr);


void traverse_3D (size_t  x, 
                  size_t  y, 
                  size_t  z, 
                  int     array[x][y][z],
                  func_t  function)
{
  for(size_t ix=0; ix<x; ix++)
  {
    for(size_t iy=0; iy<y; iy++)
    {
      for(size_t iz=0; iz<z; iz++)
      {
        function(&array[ix][iy][iz]);
      }
    }
  }
}


void fill_up (int* item_ptr)  // fill array with some random numbers
{
  static uint8_t counter = 0;
  *item_ptr = counter;
  counter++;
}

void print (int* item_ptr)
{
  printf("%d ", *item_ptr);
}


int main()
{
  int arr [2][3][4];

  traverse_3D(2, 3, 4, arr, fill_up);
  traverse_3D(2, 3, 4, arr, print);
}

EDIT

To shut up all speculations, here are some benchmarking results from Windows. Tests were done with a matrix of size [20][30][40]. The fill_up function was called either from traverse_3D or from a 3-level nested loop directly in main(). Benchmarking was done with QueryPerformanceCounter().

Case 1: gcc -std=c99 -pedantic-errors -Wall

With function, time in us:      255.371402
Without function, time in us:   254.465830

Case 2: gcc -std=c99 -pedantic-errors -Wall -O2

With function, time in us:      115.913261
Without function, time in us:   48.599049

Case 3: gcc -std=c99 -pedantic-errors -Wall -O2, traverse_3D function inlined

With function, time in us:      37.732181
Without function, time in us:   37.430324

Why the "without function" case performs somewhat better with the function inlined, I have no idea. I can comment out the call to it and still get the same benchmarking results for the "without function" case.

The conclusion however, is that with proper optimization, performance is most likely a non-issue.

As far as I understand, calls to function pointers are never inlined, as the compiler doesn't know at compilation time what function will be called. — Jsl, Aug 05 '14 at 07:31
This guarantees a full function call/return overhead per inner-loop iteration. For small arrays in a performance-irrelevant context or if the function call/return overhead is negligible compared to the operations performed (e.g. a very complex calculation), this would be fine. But if you are doing something small in the innermost loop (say, a handful of arithmetic ops) on a large array it's going to add enormous overhead. — Emmet, Aug 05 '14 at 09:26
@Jsl Of course the compiler knows it at compile time, it is written right there in my main() code, with no conditions or anything. If you are paranoid, I suppose you could make the parameter a const function pointer, but that shouldn't be necessary. As usual, benchmark the code before you start to worry about things like that. — Lundin, Aug 05 '14 at 10:06
@Emmet Benchmark it with all optimizations on and compare with a plain loop in main(). I would be quite surprised if you come up with some "enormous overhead". — Lundin, Aug 05 '14 at 10:08

score 0 · Answer 3 · edited May 23 '17 at 12:11

If these 3D arrays are “small”, you can ignore me. If your 3D arrays are large, but you don't much care about performance, you can ignore me. If you subscribe to the (common but false) doctrine that compilers are quasi-magical tools that can poop out optimal code almost irrespective of the input, you can ignore me.

You are probably aware of the general caveats regarding macros, how they can frustrate debugging, etc., but if your 3D arrays are “large” (whatever that means), and your algorithms are performance-oriented, there may be drawbacks of your strategy that you may not have considered.

First: if you are doing linear algebra, you almost certainly want to use dedicated linear algebra libraries, such as BLAS, LAPACK, etc., rather than “rolling your own”. OpenBLAS (from GotoBLAS) will totally smoke any equivalent you write, probably by at least an order of magnitude. This is doubly true if your matrices are sparse and triply true if your matrices are sparse and structured (such as tridiagonal).

Second: if your 3D arrays represent Cartesian grids for some kind of simulation (like a finite-difference method), and/or are intended to be fed to any numerical library, you absolutely do not want to represent them as C 3D arrays. You will want, instead, to use a 1D C array and use library functions where possible and perform index computations yourself (see this answer for details) where necessary.

Third: if you really do have to write your own triple-nested loops, the nesting order of the loops is a serious performance consideration. It might well be that the data-access pattern for ijk order (rather than ikj or kji) yields poor cache behavior for your algorithm, as is the case for dense matrix-matrix multiplication, for example. Your compiler might be able to do some limited loop exchange (last time I checked, icc would produce reasonably fast code for naive xGEMM, but gcc wouldn't). As you implement more and more triple-nested loops, and your proposed solution becomes more and more attractive, it becomes less and less likely that a “one loop-order fits all” strategy will give reasonable performance in all cases.

Fourth: any “one loop-order fits all” strategy that iterates over the full range of every dimension will not be tiled, and may exhibit poor performance.

Fifth (and with reference to another answer with which I disagree): I believe, in general, that the “best” data type for any object is the set with the smallest size and the least algebraic structure, but if you decide to indulge your inner pedant and use size_t or another unsigned integer type for matrix indices, you will regret it. I wrote my first naive linear algebra library in C++ in 1994. I've written maybe a half dozen in C over the last 8 years and, every time, I've started off trying to use unsigned integers and, every time, I've regretted it. I've finally decided that size_t is for sizes of things and a matrix index is not the size of anything.

Sixth (and with reference to another answer with which I disagree): a cardinal rule of HPC for deeply nested loops is to avoid function calls and branches in the innermost loop. This is particularly important where the op-count in the innermost loop is small. If you're doing a handful of operations, as is the case more often than not, you don't want to add a function call overhead in there. If you're doing hundreds or thousands of operations in there, you probably don't care about a handful of instructions for a function call/return and, therefore, they're OK.

Finally, if none of the above are considerations that jibe with what you're trying to implement, then there's nothing wrong with what you're proposing, but I would carefully consider what Jens said about braces.

"If you subscribe to the (common but false) doctrine that compilers are quasi-magical tools that can poop out optimal code almost irrespective of the input, you can ignore me." But what if you subscribe to the doctrine of people who actually benchmark code before stating a lot of nonsense? See the edit of my post for benchmarking results. — Lundin, Aug 05 '14 at 10:54
@Lundin: Which part, specifically, is “nonsense”? Do you contend that tiling is ineffective and compilers can always perform optimum loop exchange? That would be a very strange position. — Emmet, Aug 05 '14 at 18:12
What's nonsense is all your talk about function calls in the inner-most loop being ineffective by default. That assumes that the compiler is always bad and/or old. As usual with programming, you have to determine whether or not something is efficient from case to case. To dogmatically say "this is always inefficient" will lead people astray, thinking they must write obscure macros for the sake of performance, even though it is nothing but pre-mature optimization, as my benchmarking showed. — Lundin, Aug 06 '14 at 06:23
Also, the whole argument against using unsigned integers seems based on some particular personal experience of yours. If you are developing math libraries, I can see why unsigned numbers might be inconvenient. But in other areas of programming, such as embedded systems that use plenty of bit manipulations, unsigned numbers is a blessing which avoids numerous bugs and poorly-defined behavior. — Lundin, Aug 06 '14 at 06:31

Macros for 3D loops in C

3 Answers3