If these 3D arrays are “small”, you can ignore me. If your 3D arrays are large, but you don't much care about performance, you can ignore me. If you subscribe to the (common but false) doctrine that compilers are quasi-magical tools that can poop out optimal code almost irrespective of the input, you can ignore me.
You are probably aware of the general caveats regarding macros, how they can frustrate debugging, etc., but if your 3D arrays are “large” (whatever that means), and your algorithms are performance-oriented, there may be drawbacks of your strategy that you may not have considered.
First: if you are doing linear algebra, you almost certainly want to use dedicated linear algebra libraries, such as BLAS, LAPACK, etc., rather than “rolling your own”. OpenBLAS (from GotoBLAS) will totally smoke any equivalent you write, probably by at least an order of magnitude. This is doubly true if your matrices are sparse and triply true if your matrices are sparse and structured (such as tridiagonal).
Second: if your 3D arrays represent Cartesian grids for some kind of simulation (like a finite-difference method), and/or are intended to be fed to any numerical library, you absolutely do not want to represent them as C 3D arrays. You will want, instead, to use a 1D C array and use library functions where possible and perform index computations yourself (see this answer for details) where necessary.
Third: if you really do have to write your own triple-nested loops, the nesting order of the loops is a serious performance consideration. It might well be that the data-access pattern for ijk order (rather than ikj or kji) yields poor cache behavior for your algorithm, as is the case for dense matrix-matrix multiplication, for example. Your compiler might be able to do some limited loop exchange (last time I checked, icc would produce reasonably fast code for naive xGEMM, but gcc wouldn't). As you implement more and more triple-nested loops, and your proposed solution becomes more and more attractive, it becomes less and less likely that a “one loop-order fits all” strategy will give reasonable performance in all cases.
Fourth: any “one loop-order fits all” strategy that iterates over the full range of every dimension will not be tiled, and may exhibit poor performance.
Fifth (and with reference to another answer with which I disagree): I believe, in general, that the “best” data type for any object is the set with the smallest size and the least algebraic structure, but if you decide to indulge your inner pedant and use size_t
or another unsigned integer type for matrix indices, you will regret it. I wrote my first naive linear algebra library in C++ in 1994. I've written maybe a half dozen in C over the last 8 years and, every time, I've started off trying to use unsigned integers and, every time, I've regretted it. I've finally decided that size_t
is for sizes of things and a matrix index is not the size of anything.
Sixth (and with reference to another answer with which I disagree): a cardinal rule of HPC for deeply nested loops is to avoid function calls and branches in the innermost loop. This is particularly important where the op-count in the innermost loop is small. If you're doing a handful of operations, as is the case more often than not, you don't want to add a function call overhead in there. If you're doing hundreds or thousands of operations in there, you probably don't care about a handful of instructions for a function call/return and, therefore, they're OK.
Finally, if none of the above are considerations that jibe with what you're trying to implement, then there's nothing wrong with what you're proposing, but I would carefully consider what Jens said about braces.