73

I'm trying to understand when and when not to use the restrict keyword in C and in what situations it provides a tangible benefit.

After reading, "Demystifying The Restrict Keyword", ( which provides some rules of thumb on usage ), I get the impression that when a function is passed pointers, it has to account for the possibility that the data pointed to might overlap (alias) with any other arguments being passed into the function. Given a function:

foo(int *a, int *b, int *c, int n) {
    for (int i = 0; i<n; ++i) {
        b[i] = b[i] + c[i];
        a[i] = a[i] + b[i] * c[i];
    } 
}

the compiler has to reload c in the second expression, because maybe b and c point to the same location. It also has to wait for b to be stored before it can load a for the same reason. It then has to wait for a to be stored and must reload b and c at the beginning of the next loop. If you call the function like this:

int a[N];
foo(a, a, a, N);

then you can see why the compiler has to do this. Using restrict effectively tells the compiler that you will never do this, so that it can drop the redundant load of c and load a before b is stored.

In a different SO post, Nils Pipenbrinck, provides a working example of this scenario demonstrating the performance benefit.

So far I've gathered that it's a good idea to use restrict on pointers you pass into functions which won't be inlined. Apparently if the code is inlined the compiler can figure out that the pointers don't overlap.

Now here's where things start getting fuzzy for me.

In Ulrich Drepper's paper, "What every programmer should know about memory" he makes the statement that, "unless restrict is used, all pointer accesses are potential sources of aliasing," and he gives a specific code example of a submatrix matrix multiply where he uses restrict.

However, when I compile his example code either with or without restrict I get identical binaries in both cases. I'm using gcc version 4.2.4 (Ubuntu 4.2.4-1ubuntu4)

The thing I can't figure out in the following code is whether it needs to be rewritten to make more extensive use of restrict, or if the alias analysis in GCC is just so good that it's able to figure out that none of the arguments alias each other. For purely educational purposes, how can I make using or not using restrict matter in this code - and why?

For restrict compiled with:

gcc -DCLS=$(getconf LEVEL1_DCACHE_LINESIZE) -DUSE_RESTRICT -Wextra -std=c99 -O3 matrixMul.c -o matrixMul

Just remove -DUSE_RESTRICT to not use restrict.

#include <stdlib.h>
#include <stdio.h>
#include <emmintrin.h>

#ifdef USE_RESTRICT
#else
#define restrict
#endif

#define N 1000
double _res[N][N] __attribute__ ((aligned (64)));
double _mul1[N][N] __attribute__ ((aligned (64)))
    = { [0 ... (N-1)] 
    = { [0 ... (N-1)] = 1.1f }};
double _mul2[N][N] __attribute__ ((aligned (64)))
    = { [0 ... (N-1)] 
    = { [0 ... (N-1)] = 2.2f }};

#define SM (CLS / sizeof (double))

void mm(double (* restrict res)[N], double (* restrict mul1)[N], 
        double (* restrict mul2)[N]) __attribute__ ((noinline));

void mm(double (* restrict res)[N], double (* restrict mul1)[N], 
        double (* restrict mul2)[N])
{
 int i, i2, j, j2, k, k2; 
    double *restrict rres; 
    double *restrict rmul1; 
    double *restrict rmul2; 

    for (i = 0; i < N; i += SM)
        for (j = 0; j < N; j += SM)
            for (k = 0; k < N; k += SM)
                for (i2 = 0, rres = &res[i][j],
                    rmul1 = &mul1[i][k]; i2 < SM;
                    ++i2, rres += N, rmul1 += N)
                    for (k2 = 0, rmul2 = &mul2[k][j];
                        k2 < SM; ++k2, rmul2 += N)
                        for (j2 = 0; j2 < SM; ++j2)
                          rres[j2] += rmul1[k2] * rmul2[j2];
}

int main (void)
{

    mm(_res, _mul1, _mul2);

 return 0;
}
Community
  • 1
  • 1
Robert S. Barnes
  • 39,711
  • 30
  • 131
  • 179
  • The quick answer is: **Don't**. Using yet another type qualifier makes the code less legible and increases the chance of hard-to-debug bugs. In most cases you should trust your compiler to figure this stuff out. – Georg Schölly Jan 05 '10 at 11:01
  • 17
    But if you're writing a library, the compiler *can't* figure that out because it cannot know all the callers. Also, using `restrict` for a function parameter serves as documentation for the API user. – JaakkoK Jan 05 '10 at 11:04
  • What happens if you change the call in `main` to `mm(_res, _mul1, _mul1)`? I know this is undefined when you have `restrict` in there but it might provide a difference. Also, you could split `mm` into its own file and compile it separately. – JaakkoK Jan 05 '10 at 11:12
  • if you remove any information on the possible callers, you still get the same output with and without restrict – shodanex Jan 05 '10 at 11:29
  • 18
    @gs: Considering that allot of well respected people who write highly optimized code for a living all suggest making use of `restrict` a "best practice" I think it's worthwhile trying to understand the issues involved. Otherwise I wouldn't have asked the question. – Robert S. Barnes Jan 05 '10 at 11:54
  • I don't know about gcc, but the Visual Studio C++ compiler has a switch "assume no aliasing", perhaps gcc has a similar one, and it's turned on. – erikkallen Jan 05 '10 at 13:12
  • 10
    Very good question. Recommend changing `for (int i = n; i – chux - Reinstate Monica Sep 11 '13 at 15:15
  • 5
    @chux First person to notice that typo in three years... – Robert S. Barnes Sep 15 '13 at 20:41
  • I think the code is identical because the arrays are declared at file scope, like described in your first link. The static analyzer can apparently determine in this case that they do not alias. – okovko Mar 01 '23 at 08:55

9 Answers9

16

It is a hint to the code optimizer. Using restrict ensures it that it can store a pointer variable in a CPU register and not have to flush an update of the pointer value to memory so that an alias is updated as well.

Whether or not it takes advantage of it depends heavily on implementation details of the optimizer and the CPU. Code optimizers already are heavily invested in detecting non-aliasing since it is such an important optimization. It should have no trouble detecting that in your code.

Hans Passant
  • 922,412
  • 146
  • 1,693
  • 2,536
  • So basically you're saying that the alias analysis in gcc is so good that it's already able to detect that there are no aliasing problems on it's own even without the hint from using the `restrict` keyword? – Robert S. Barnes Jan 05 '10 at 13:13
  • Right. You'd need to pass double** as arguments and update them to introduce a blatant alias that the optimizer can't rule out. – Hans Passant Jan 05 '10 at 13:25
  • 8
    But without any knowledge of the potentiall caller, the result is the same, so I doubt this is what happens here – shodanex Jan 05 '10 at 13:27
15

Also, GCC 4.0.0-4.4 has a regression bug that causes the restrict keyword to be ignored. This bug was reported as fixed in 4.5 (I lost the bug number though).

3

It's worth noting that recent versions of clang are capable of generating code with a run-time check for aliasing, and two code paths: one for cases where there is potential aliasing and the other for case where is is obvious there is no chance of it.

This clearly depends on the extents of data pointed to being conspicuous to the compiler - as they would be in the example above.

I believe the prime justification is for programs making heavy use of STL - and particularly <algorithm> , where is either difficult or impossible to introduce the __restrict qualifier.

Of course, this all comes at the expense of code-size, but removes a great deal of potential for obscure bugs that could result for pointers declared as __restrict not being quite as non-overlapping as the developer thought.

I would be surprised if GCC hadn't also got this optimisation.

marko
  • 9,029
  • 4
  • 30
  • 46
3

(I don't know if using this keyword gives you a significant advantage, actually. It's very easy for programmer to err with this qualifier as there is no enforcement, so an optimizer cannot be certain that the programmer doesn't "lie".)

When you know that a pointer A is the only pointer to some region of memory, that is, it doesn't have aliases (that is, any other pointer B will necessarily be unequal to A, B != A), you can tell this fact to the optimizer by qualifying the type of A with the "restrict" keyword.

I have written about this here: http://mathdev.org/node/23 and tried to show that some restricted pointers are in fact "linear" (as mentioned in that post).

Artyom Shalkhakov
  • 1,101
  • 7
  • 14
  • 4
    No, an optimizer is perfectly entitled to assume that the programmer does not "lie" - if they do, whatever happens as a result is the programmer's fault, not the compiler's. – Chris Stratton Sep 15 '13 at 22:15
1

May be the optimisation done here don't rely on pointers not being aliased ? Unless you preload multiple mul2 element before writing result in res2, I don't see any aliasing problem.

In the first piece of code you show, it is quite clear what kind of aliases problem can occur. Here it is not so clear.

Rereading Dreppers article, he does not specifically says restrict might solve anything. There is even this phrase :

{In theory the restrict keyword introduced into the C language in the 1999 revision should solve the problem. Compilers have not caught up yet, though. The reason is mainly that too much incorrect code exists which would mislead the compiler and cause it to generate incorrect object code.}

In this code, optimisations of memory access has already been done within the algorithm. The residual optimisation seems to be done in the vectorized code presented in appendice. So for the code presented here, I guess there is no difference, because no optimisation relying on restrict is done. Every pointer access is a source of aliasing, but not every optimisation relies on aliassing.

Premature optimization being the root of all evil, the use of the restrict keyword should be limited to the case your are actively studying and optimizing, not used wherever it could be used.

shodanex
  • 14,975
  • 11
  • 57
  • 91
  • @shodanex: This is exactly what I'm wondering. Drepper seems to indicate in his paper that specifically this code benefits from `restrict`. You can see a vectorized version of the same code using SSE2 instruction here: http://lwn.net/Articles/258188/ . Is it possible he just always uses `restrict` as his personal best practice and just didn't bother to check if it makes any difference with this code? – Robert S. Barnes Jan 05 '10 at 11:59
  • in the code you mention, there still is the presence of for (j2 = 0; j2 < SM; j2 += 2) which is where I could see aliasing occur. but this is not easy to see – shodanex Jan 05 '10 at 13:03
  • @shodanex: So why doesn't `rres[j2] += rmul1[k2] * rmul2[j2];` present an aliasing problem? Let's say I call `mm(_res, _res, _res)`. `rres[j2]` and `rmul2[j2]` have to be reloaded every time through the loop anyways. So even if `&rres[j2] == &rmul2[j2]` it doesn't matter. Now what if `&rres[j2] == &rmul1[k2]` at some point in the loops execution? If this is true, then `rmul1[k2]` has to be reloaded every time through the loop, otherwise it can go into a register. I guess the compiler unrolls the loop, sees this never occurs and wallah; potential aliasing doesn't matter. – Robert S. Barnes Jan 06 '10 at 10:41
  • @shodanex: So basically, Drepper always sticks in `restrict` as a best practice ( considering he put it in the vectorized version of the code ), and just didn't bother to check whether or not it made any difference with this particular code. That's what it looks like to me anyways. – Robert S. Barnes Jan 06 '10 at 10:43
1

If there is a difference at all, moving mm to a seperate DSO (such that gcc can no longer know everything about the calling code) will be the way to demonstrate it.

Logan Capaldo
  • 39,555
  • 5
  • 63
  • 78
0

Are you running on 32 or 64-bit Ubuntu? If 32-bit, then you need to add -march=core2 -mfpmath=sse (or whatever your processor architecture is), otherwise it doesn't use SSE. Secondly, in order to enable vectorization with GCC 4.2, you need to add the -ftree-vectorize option (as of 4.3 or 4.4 this is included as default in -O3). It might also be necessary to add -ffast-math (or another option providing relaxed floating point semantics) in order to allow the compiler to reorder floating point operations.

Also, add the -ftree-vectorizer-verbose=1 option to see whether it manages to vectorize the loop or not; that's an easy way to check the effect of adding the restrict keyword.

janneb
  • 36,249
  • 2
  • 81
  • 97
  • I'm using 32 bit Ubuntu on a Pentium M with -march=native. I added fast-math. The code I posted doesn't use SSE. Using the suggested options doesn't seem to help. I still get identical binaries in both cases. – Robert S. Barnes Jan 06 '10 at 09:57
  • Ah indeed, so it seems. I'm sure you also saw the reason for the failure to vectorize with the help of the -ftree-vectorizer-verbose= option. Which explains why Drepper had to resort to intrinsics for the vectorized version of his code. So you need to come up with another example. – janneb Jan 06 '10 at 13:42
0

The problem with your example code is that the compiler will just inline the call and see that there is no aliasing ever possible in your example. I suggest you remove the main() function and compile it using -c.

Kurt Roeckx
  • 173
  • 1
  • 5
0

The following C99 code can show you that the output of the program depends on restrict :

__attribute__((noinline))
int process(const int * restrict const a, int * const b) {
    *b /= (*a + 1) ;
    return *a + *b ;
}

int main(void) {
    int data[2] = {1, 2};
    return process(&data[0], &data[0]);
}

The software terminates with code 1 using restrict and 0 without restrict qualifier.
The compilation is done with gcc -std=c99 -Wall -pedantic -O3 main.c.
The flag -O1 do the job too.

It is useful to use restrict when, for example, you can tell the compiler that the loop condition remains unchanged, even if another pointer has been updated (necessarily, the loop condition couldn't change due to restrict).

And certainly so on.

Michel
  • 259
  • 2
  • 3