Understanding restrict qualifier by examples

Question

The restrict keyword's behavior is defined in C99 by 6.7.3.1:

Let D be a declaration of an ordinary identifier that provides a means of designating an object P as a restrict-qualified pointer to type T.

If D appears inside a block and does not have storage class extern, let B denote the block. If D appears in the list of parameter declarations of a function definition, let B denote the associated block. Otherwise, let B denote the block of main (or the block of whatever function is called at program startup in a freestanding environment).

In what follows, a pointer expression E is said to be based on object P if (at some sequence point in the execution of B prior to the evaluation of E) modifying P to point to a copy of the array object into which it formerly pointed would change the value of E.119) Note that ''based'' is defined only for expressions with pointer types.

During each execution of B, let L be any lvalue that has &L based on P. If L is used to access the value of the object X that it designates, and X is also modified (by any means), then the following requirements apply: T shall not be const-qualified. Every other lvalue used to access the value of X shall also have its address based on P. Every access that modifies X shall be considered also to modify P, for the purposes of this subclause. If P is assigned the value of a pointer expression E that is based on another restricted pointer object P2, associated with block B2, then either the execution of B2 shall begin before the execution of B, or the execution of B2 shall end prior to the assignment. If these requirements are not met, then the behavior is undefined.

Like just about everybody else, I have a hard time understanding all the intricacies of this definition. As an answer to this question, I'd like to see a set of good examples, for each requirement in the 4th paragraph, of usages that would violate the requirement. This article:

http://web.archive.org/web/20120225055041/http://developers.sun.com/solaris/articles/cc_restrict.html

does a good job of presenting the rules in terms of "a compiler may assume..."; expanding on that pattern and tying in the assumptions the compiler can make, and how they fail to hold, with each example would be great.

Seems to be a symptom of Oratroll dismantling their acquisitions. Fixed it with the wayback machine. — R.. GitHub STOP HELPING ICE, Apr 01 '14 at 16:43

ysap · Answer 1 · 2012-09-04T03:46:46.707

Below, I will refer to the usecases from the Sun paper linked to in the question.

The (relatively) obvious case would be the mem_copy() case, which falls under the 2nd usecase category in the Sun paper (the f1() function). Let's say we have the following two implementations:

void mem_copy_1(void * restrict s1, const void * restrict s2, size_t n);
void mem_copy_2(void *          s1, const void *          s2, size_t n);

Because we know there is no overlap between the two arrays pointed to by s1 and s2, the code for the 1st function would be straight forward:

void mem_copy_1(void * restrict s1, const void * restrict s2, size_t n)
{
     // naively copy array s2 to array s1.
     for (int i=0; i<n; i++)
         s1[i] = s2[i];
     return;
}

s2 = '....................1234567890abcde' <- s2 before the naive copy
s1 = '1234567890abcde....................' <- s1 after the naive copy
s2 = '....................1234567890abcde' <- s2 after the naive copy

OTOH, in the 2nd function, there may be an overlap. In this case, we need to check whether the source array is located before the destination or vice-versa, and choose the loop index boundaries accordingly.

For example, say s1 = 100 and s2 = 105. Then, if n=15, after the copy the newly copied s1 array will overrun the first 10 bytes of the source s2 array. We need to make sure we copied the lower bytes first.

s2 = '.....1234567890abcde' <- s2 before the naive copy
s1 = '1234567890abcde.....' <- s1 after the naive copy
s2 = '.....67890abcdeabcde' <- s2 after the naive copy

However, if, s1 = 105 and s2 = 100, then writing the lower bytes first will overrun the last 10 bytes of the source s2, and we end up with an erroneous copy.

s2 = '1234567890abcde.....' <- s2 before the naive copy
s1 = '.....123451234512345' <- s1 after the naive copy - not what we wanted
s2 = '123451234512345.....' <- s2 after the naive copy

In this case, we need to copy the last bytes of the array first, possibly stepping backwards. The code will look something like:

void mem_copy_2(void *s1, const void *s2, size_t n)
{
    if (((unsigned) s1) < ((unsigned) s2))
        for (int i=0; i<n; i++)
             s1[i] = s2[i];
    else
        for (int i=(n-1); i>=0; i--)
             s1[i] = s2[i];
    return;
}

It is easy to see how the restrict modifier gives a chance for better speed optimization, eliminating extra code, and an if-else decision.

At the same time, this situation is hazardous to the incautious programmer, who passes overlapping arrays to the restrict-ed function. In this case, no guards are there for ensuring the proper copying of the array. Depending on the optimization path chosen by the compiler, the result is undefined.

The 1st usecase (the init() function) can be seen as a variation on the 2nd one, described above. Here, two arrays are created with a single dynamic memory allocation call.

Designating the two pointers as restrict-ed enables optimization in which the instructions order would matter otherwise. For example, if we have the code:

a1[5] = 4;
a2[3] = 8;

then the optimizer can reorder these statements if it finds it useful.

OTOH, if the pointers are not restrict-ed, then it is important that the 1st assignment will be performed before the second one. This is because there is a possibility that a1[5] and a2[3] are actually the same memory location! It is easy to see that when this is the case, then the end value there should be 8. If we reorder the instructions, then the end value will be 4!

Again, if non-disjoint pointers are given to this restrict-ed assumed code, the result is undefined.

"As an answer to this question, I'd like to see a set of good examples, for each requirement in the 4th paragraph, of usages that would violate the requirement." I do not think this answer provides even a single such example. — Nemo, Sep 04 '12 at 04:13
@Nemo - really? I think I gave 2 examples, referring to the two usecases, where providing non-disjoint arrays to a code that assumes restrict qualified pointers may give a erroneous result. — ysap, Sep 04 '12 at 04:18
I am not ruling out the possibility that I did not understand the question, though. ;-) — ysap, Sep 04 '12 at 04:20
I don't object to this answer because it's informative, but I think what you've done is more explaining what intended use/purpose of restrict is. The intent of my question is to dig into what the rules about use of restrict pointers really mean and the possible (and ideally, subtle) ways code attempting to use restrict could end up breaking them. — R.. GitHub STOP HELPING ICE, Sep 04 '12 at 04:58
So, in case it is unclear (maybe I should have edited the answer), the example of the overlapping arrays in the the two cases show how using these pointers with `restrict` code breaks the rules. But again - late hour - maybe I just don't get the question right. — ysap, Sep 04 '12 at 05:03
I think he wants one example for each clause in the fourth paragraph... But you are right, my downvote was extreme. Make a trivial edit to this answer and I will un-downvote. — Nemo, Sep 05 '12 at 02:23
[Ars Technica](http://arstechnica.com/information-technology/2014/05/ask-ars-why-are-some-programming-languages-faster-than-others/) has a very nice discussion on aliasing and how it limits possible optimisations, when comparing Fortran and C. — Yann-Gaël Guéhéneuc, May 12 '14 at 05:30
Your "example" effectively discusses the difference between memcpy/memmove. The use of `restrict` is to allow the loop to read a bunch of data from the source and then write a bunch of data to the destination, rather than reading and writing bytes individually. Without `restrict`, the behavior of the `for` loop shown would be defined even in cases where `dest` pointed one byte beyond `src`, but an optimizer which tried to use read-multiple instructions to speed it up would change its behavior. — supercat, Jun 22 '15 at 22:40

Understanding restrict qualifier by examples

1 Answers1

Linked