10

Consider the following simple example:

struct __attribute__ ((__packed__)) {
 int code[1];
 int place_holder[100];
} s;

void test(int n)
{
 int i;

 for (i = 0; i < n; i++) {
  s.code[i] = 1;
 }
}

The for-loop is writing to the field code, which is of size 1. The next field after code is place_holder.
I would expect that in case of n > 1, the write to code array would overflow and 1 would be written to place_holder.

However, when compiling with -O2 (on gcc 4.9.4 but probably on other versions as well) something interesting happens.
The compiler identifies that the code might overflow array code, and limits loop unrolling to 1 iteration.

It's easy to see that when compiling with -fdump-tree-all and looking at the last tree pass ("t.optimized"):


;; Function test (test, funcdef_no=0, decl_uid=1366, symbol_order=1)

Removing basic block 5
test (int n)
{
  <bb 2>:
  # DEBUG i => 0
  # DEBUG i => 0
  if (n_4(D) > 0)
    goto <bb 3>;
  else
    goto <bb 4>;

  <bb 3>:
  s.code[0] = 1;
  # DEBUG i => 1
  # DEBUG i => 1

  <bb 4>:
  return;

}

So in this case the compiler completely unrolled the loop to a single iteration.

My questions are:

  1. From C specification viewpoint, is overflowing (deliberately) from one struct member to the next is illegal or undefined behavior?
    Let's assume I'm aware of the struct layout in memory and know what I'm doing when deliberately overflowing the code array.
  2. Is there a way to prevent gcc from unrolling the loop in such case? I know I can completely prevent loop unrolling, however I'm still interested in loop unrolling on other cases. I also suspect that the analysis the compiler is doing might affect passes other than loop unrolling.
    gcc is assuming I'm not going to overflow when accessing my array, so what I'm really looking for is way to tell the compiler not to take this assumption (by providing some compiler option).

I'm aware it's a bad practice to write such code that overflows from one field to another, and I'm not intending to write such code.
I'm also aware of the practice to put an array (possibly zero sized) as the last struct field to allow it to overflow, this is well supported by compilers, while in this case the array code is not the last field.
So this is not a question of "how to fix the code", but rather a question of understanding the compiler assumptions and affecting them.

These questions came up when I observed existing code that was already written in such way, and debugged it to find out why it's not behaving as the original developer expected it to behave.
The risk is that there are other places in the code where such problem exists. Static analysis tools can help to find out, but I would also like to know if there's a way to make the compiler tolerate such code and still generate the result we would expect.

Update

I got clear answer to question (1) above, but not for question (2).

  • Can gcc allow this as an extension, by some compile options?
  • Is there a way to at least get a warning when gcc identifies it? (and it clearly identifies it, by optimizing things out).
    That's important in order to identify such cases in a large existing code base.
Amir Gonnen
  • 3,525
  • 4
  • 32
  • 61
  • 3
    Undefined behavior. The compiler is free to assume that writing to `code` will not have any effect on `place_holder`. – Tom Karzes Jul 02 '20 at 09:00
  • @TomKarzes could you point me to the place in the C specifications that says that? – Amir Gonnen Jul 02 '20 at 09:01
  • 2
    Does this answer your question? [In a structure, is it legal to use one array field to access another one?](https://stackoverflow.com/questions/47094166/in-a-structure-is-it-legal-to-use-one-array-field-to-access-another-one) – Alex Lop. Jul 02 '20 at 09:33
  • 1
    @AmirGonnen check this: https://stackoverflow.com/a/47094204/5218277 – Alex Lop. Jul 02 '20 at 09:34
  • 1
    @AlexLop.Partly. It answers (1) but not (2). On (2) I hoped to find some compiler option to allow me to do it, or at least give a warning when it happens. The challenge is to locate such places in a large code base. (static analysis might help) – Amir Gonnen Jul 02 '20 at 09:38
  • Regarding update, you can't access an array out of bounds, period. And C compilers are unlikely to check if you do. You could however utilize flexible array members, or perhaps a union between `int code[1];` and `int place_holder[101];`. – Lundin Jul 02 '20 at 09:58
  • Why would you want to prevent the loop unrolling in particular? What is the reason behind this? Any kind of exploit? – RobertS supports Monica Cellio Jul 02 '20 at 12:40
  • @RobertSsupportsMonicaCellio It's very simple. We have a large codebase that had this bug - a loop unrolled to 2 although the programmer expected it to behave differently. We can fix this specific occurrence, but who knows where else this problem could appear on other parts of the code? So - either make this work as the programmers expect (prevent loop unrolling), or find a way to locate all the other occurrences of this problem (that's where a compiler warning could get handy, but there isn't any AFAIK. Still need to check a static code analysis tool to see if it can detect this). – Amir Gonnen Jul 02 '20 at 12:50
  • You can try `-fno-aggressive-loop-optimizations`, but I'd rather suggest detecting these invalid accesses, e.g., using address-sanitizer as suggested by @RobertS. – chtz Jul 02 '20 at 14:23
  • @chtz OP said that implementation is MIPS and there is no address-sanitizer available. Nonetheless s/he could use [Valgrind](https://valgrind.org/) as I mentioned after this information. – RobertS supports Monica Cellio Jul 02 '20 at 20:14

5 Answers5

6

From C specification viewpoint, is overflowing (deliberately) from one struct member to the next is illegal or undefined behavior?

It is undefined behavior. The arr[i] operator is syntactic sugar around *(arr + i). So array access boils down to the binary + operator for pointer arithmetic, C17 6.5.6 additive operators, from §7 and §8:

For the purposes of these operators, a pointer to an object that is not an element of an array behaves the same as a pointer to the first element of an array of length one with the type of the object as its element type.

When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. /--/
If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

As you noticed, optimizing compilers might exploit these rules to produce faster code.


Is there a way to prevent gcc from unrolling the loop in such case?

There is a a special exception rule that can be used, C17 6.3.2.3/7:

When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

Also, strict aliasing does not apply to character types, because of another special rule in C17 6.5 §7

An object shall have its stored value accessed only by an lvalue expression that has one of the following types: ... a character type.

These two special rules co-exist in harmony. So assuming we don't mess up alignment etc during the pointer conversion, this means that we are allowed to do this:

unsigned char* i;
for(i = (unsigned char*)&mystruct; i < (unsigned char*)(&mystruct + 1); i++)
{
  do_something(*i);
}

This may however read padding bytes etc so it's "implementation-defined". But in theory you can access the struct byte per byte, and as long as the struct offsets are calculated on byte-per-byte basis, you can iterate across multiple members of the struct (or any other object) in this manner.


As far as I can tell, this very questionable-looking code should be well-defined:

#include <stdint.h>
#include <string.h>
#include <stdio.h>

struct __attribute__ ((__packed__)) {
 int code[1];
 int place_holder[100];
} s;

void test(int val, int n)
{
  for (unsigned char* i = (unsigned char*)&s; 
       i < (unsigned char*)&s + n*sizeof(int); 
       i += _Alignof(int)) 
  {
    if((uintptr_t)i % _Alignof(int) == 0) // not really necessary, just defensive prog.
    {
      memcpy(i, &val, sizeof(int));
      printf("Writing %d to address %p\n", val, (void*)i);
    }
  }
}

int main (void)
{
  test(42, 3);
  printf("%d %d %d\n", s.code[0], s.place_holder[0], s.place_holder[1]);
}

This works fine on gcc and clang (x86). How efficient it is, well that's another story. Please don't write code like this, though.

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
Lundin
  • 195,001
  • 40
  • 254
  • 396
1

From C specification viewpoint, is overflowing (deliberately) from one struct member to the next is illegal or undefined behavior?

It's undefined behavior to access an array out-of-bounds. From C11 J.2:

The behavior is undefined in the following circumstances:

[...]

An array subscript is out of range [...]

Is there a way to prevent gcc from unrolling the loop in such case?

Alias code with a volatile pointer. But even using an intermediary pointer seems to work. godbolt link

KamilCuk
  • 120,984
  • 8
  • 59
  • 111
  • To be picky, Annex J isn't normative but an informative summary, so you should find the actual part in the normative chapters when quoting the standard correctly. – Lundin Jul 02 '20 at 09:12
  • @Lundin It's quite curious why it isn't normative. Clear identified portability issues should be confirmed by the standard, not just noted. Even because they closely relate to the normative parts, if not even mirrow them. – RobertS supports Monica Cellio Jul 02 '20 at 12:49
  • @RobertSsupportsMonicaCellio That's just how ISO standards work (standards are standardized) - it's just a handy summary annex. It clearly says "Annex J (informative)" at the beginning of the chapter. It would be confusing to have the same normative text in two places, I suppose. (Also there are cases where Annex J is incorrect or contains incorrect references, such as UB "The value of an object with automatic storage duration is used while it is indeterminate (6.2.4, 6.7.9, 6.8)") – Lundin Jul 02 '20 at 12:53
1

1. Question:

"From C specification viewpoint, is overflowing (deliberately) from one struct member to the next illegal or undefined behavior?"

It is undefined behavior. The C standard states (emphasize mine):

"A postfix expression followed by an expression in square brackets [] is a subscripted designation of an element of an array object. The definition of the subscript operator [] is that E1[E2] is identical to (*((E1)+(E2))). Because of the conversion rules that apply to the binary + operator, if E1 is an array object (equivalently, a pointer to the initial element of an array object) and E2 is an integer, E1[E2] designates the E2-th element of E1 (counting from zero)."

Source: ISO/IEC 9899:2018 (C18), §6.5.2.1/2

"When an expression that has integer type is added to or subtracted from a pointer, the result has the type of the pointer operand. If the pointer operand points to an element of an array object, and the array is large enough, the result points to an element offset from the original element such that the difference of the subscripts of the resulting and original array elements equals the integer expression. In other words, if the expression P points to the i-th element of an array object, the expressions (P) + N (equivalently, N + (P)) and (P) - N (where N has the value n) point to, respectively, the i+n-th and i−n-th elements of the array object, provided they exist. Moreover, if the expression P points to the last element of an array object, the expression (P) + 1 points one past the last element of the array object, and if the expression Q points one past the last element of an array object, the expression (Q) - 1 points to the last element of the array object. If both the pointer operand and the result point to elements of the same array object, or one past the last element of the array object, the evaluation shall not produce an overflow; otherwise, the behavior is undefined. If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated."

Source: ISO/IEC 9899:2018 (C18), §6.5.6/8

Also non-normative Annex J states with regard to paragraph §6.5.6 in the normative standard:

J.2 Undefined behavior

1 The behavior is undefined in the following circumstances:

....

  • An array subscript is out of range, even if an object is apparently accessible with the given subscript (as in the lvalue expression a[1][7] given the declaration int a[4][5]) (6.5.6).

2. Question (plus update):

"Is there a way to prevent gcc from unrolling the loop in such case?"

"Can gcc allow this as an extension, by some compile options?"

"Is there a way to at least get a warning when gcc identifies it? That's important in order to identify such cases in a large existing code base."

You could try to place an empty assembly code function like asm(""); into the loop, as shown in this answer by Denilson Sá Maia, f.e.:

 for (i = 0; i < n; i++) {
    s.code[i] = 1;
    asm("");
 }

or #pragma's around the test function, as shown here, f.e.:

#pragma GCC push_options
#pragma GCC optimize ("O0")

void test(int n)
{
   int i;

   for (i = 0; i < n; i++) {
      s.code[i] = 1;
   }
}

#pragma GCC pop_options

to prevent the optimization for that specific program part in general and with that the loop unrolling.

Related:


It is not preventing the loop unrolling, but you can use AddressSanitizer, which also got LeakSanitizer integrated, and is built into GCC since version 4.8 to detect when the loop unrolling doesn't work/you access non-affiliated memory.

More information about this, you can find here.

Edit: As you said your target implementation is MIPS, you can still use Valgrind to detect memory leaks.

  • Nice idea. However, the code is built for an embedded application, therefore the additional instrumentation overhead would not be welcome. Also, target is MIPS and fsantize is not implemented for that architecture (at least with the toolchain we are using) – Amir Gonnen Jul 02 '20 at 12:33
  • @AmirGonnen OK. Well, just my two cents. I let the answer alive for anyone who encounters the same issue. I will continue the discussion [in the comments](https://stackoverflow.com/questions/62692609/allowing-struct-field-to-overflow-to-the-next-field/62695064?noredirect=1#comment110873301_62692609) to your question, so it is better to see for others. – RobertS supports Monica Cellio Jul 02 '20 at 12:40
  • @AmirGonnen Just as a side note: [valgrind is available for MIPS](https://stackoverflow.com/questions/18009140/how-to-run-valgrind-to-find-out-memory-leaks-on-my-embedded-mipsel-linux-box?rq=1). – RobertS supports Monica Cellio Jul 02 '20 at 13:03
  • That's good to know. But it still needs to run the specific flow in the code to find the problem, right? I would like to locate all these problems in the code regardless of test coverage. So I'm looking for something more like compiler warnings or static code analysis. – Amir Gonnen Jul 02 '20 at 13:08
  • @AmirGonnen You could also prevent the optimization by placing an assembly code instruction into the loop. – RobertS supports Monica Cellio Jul 02 '20 at 13:15
  • 1
    @AmirGonnen I decided to (at least try to) compete with Lundin's answer at my old days lately and make my answer cover also the first question so that my answer covers the whole *entire* question of yours. ;-) – RobertS supports Monica Cellio Jul 02 '20 at 15:16
1

Just _Static_assert the layout and do the pointer arithmetic in (char*), then cast to (int*) and do the access. No further tricks such as memcpy/_Alignof are required because ints are unpadded and you are accessing ints where there really are ints.

This alone makes gcc unroll the loop.

Why character-pointer based (char*, signed char*, unsigned char*) pointer arithmetic is required is because http://port70.net/~nsz/c/c11/n1570.html#J.2 (non-normatively, as it is just an appendix, but gcc seems to follow it) makes out-of bounds accesses UB, but http://port70.net/~nsz/c/c99/n1256.html#6.2.6.1p4 and http://port70.net/~nsz/c/c99/n1256.html#6.5p6 still allow inspecting any object via character pointers (more discussion on this at Is accessing an element of a multidimensional array out of bounds undefined behavior?).

Alternatively you could do the pointer arithmetic via uintptr_t (then it will be implementation defined) but gcc optimizes those worse in certain cases (gcc doesn't fold (uintptr_t)p < (uintptr_t)(p+10) into true, but it does so for (char*)p < (char*)(p+10). This could be considered a missed optimization).

struct  __attribute__ ((__packed__)) s {
    int code[1];
    int place_holder[100];
} s;


void test_s(int n) //original
{
    int i;
    for (i = 0; i < n; i++) {
        s.code[i] = 1;
    }
}

#include <stddef.h> //offsetof
void test_s2(int n) //unrolls the loop
{
    _Static_assert(offsetof(struct s,code)+sizeof(int)==offsetof(struct s,place_holder),"");
    //^will practically hold even without __attribute__((__packed__))

    int i; for (i = 0; i < n; i++)
        *(int*)((char*)&s.code + (size_t)i*sizeof(s.code[0])) = 1;
}

/////////////


//same code as test_s2
struct r {
    int code101[101];
} r;
void test_r(int n)
{
    int i;

    for (i = 0; i < n; i++) {
        r.code101[i] = 1;
    }
}
Petr Skocik
  • 58,047
  • 6
  • 95
  • 142
0

In the language Dennis Ritchie described in 1974, the behavior of struct member access operators and pointer arithmetic were defined in terms of machine addresses, and except for the use of object size to scale pointer arithmetic, were agnostic as to the types of objects the addresses represented. The C Standard allows implementations to behave in that fashion when their customers would find it useful, but would also allow them to do other things, such as trapping out-of-bounds array accesses, if customers would find those other behaviors more useful.

Although later C dialects effectively behaved as though struct member names are prefixed by the struct name, so as to give each structure type its own member namespace, in most other respects compilers can be configured, by disabling optimizations if nothing else, to behave in a fashion consistent with Ritchie's 1974 language. Unfortunately, there's no way to distinguish implementations that will consistently behave in that fashion from those that won't; some compilers, especially those which go back to a time before the Standard, don't explicitly document that they support the 1974 behaviors because they were written at a time when compilers were generally expected to do so unless they documented otherwise.

supercat
  • 77,689
  • 9
  • 166
  • 211