3

I am working on a piece of legacy code (no tests). I stumbled upon a section hidden inside several macros. It generates a warning if compiled with GCC's -Wvla.

The code in question is equivalent to what can be seen in this small program:

typedef struct entry {
    unsigned index;
    unsigned reserved;
    unsigned value;
} entry_t;

int main(int argc, char **argv) {
    long pa = 0;
    long res = pa + sizeof(entry_t[10 - argc]);
    return res;
}

When compiled, it gives out a warning:

$ gcc -g -Wvla repro-vla.c
repro-vla.c: In function ‘main’:
repro-vla.c:9:5: warning: ISO C90 forbids variable length array [-Wvla]
    9 |     long res = pa + sizeof(entry_t[10 - argc]);
      |     ^~~~

The culprit is of course this expression: sizeof(entry_t[10 - argc]). The syntax is a bit confusing here. I believe a temporary anonymous array for 10 - argc entries of type entry_t is created, then its size is taken, and the array is discarded.

My questions are:

  1. Is my understanding of the code as it is written now correct?
  2. How is that expression any different from sizeof(entry_t) * (10-argc)? Both calculate the same value, and neither one does anything to guard against the underflow (when argc >= 10). The second expression does not use variable length arrays and as such won't generate the warning, and it is also easier to comprehend in my opinion.
Grigory Rechistov
  • 2,104
  • 16
  • 25
  • Does this answer your question? [ISO C90 forbids variable length array](https://stackoverflow.com/questions/10234288/iso-c90-forbids-variable-length-array) – ti7 Feb 26 '23 at 07:52
  • 1
    "I believe a temporary anonymous array for 10 - argc entries of type entry_t is created, then its size is taken, and the array is discarded." --> no array is created. – chux - Reinstate Monica Feb 26 '23 at 07:53
  • 1
    @ti7 no, read his questions just dont link to something without reading the post – Fredrik Feb 26 '23 at 07:54
  • Possible difference is code analysis when `argv >= 10` as `sizeof(entry_t[0 or less])` is problematic and `sizeof(entry_t) * (10-argc)` is OK. – chux - Reinstate Monica Feb 26 '23 at 07:58
  • Difference might also be output assembly, where at the time this would use `lea` but the multiplication would not. – Emanuel P Feb 26 '23 at 08:01
  • The person writing the code didn't care to support 30 year old compilers . Your suggested change is exactly equivalent iff `argc < 10` – M.M Feb 26 '23 at 08:04
  • *"and it is also easier to comprehend in my opinion"* - Your own familiarity likely plays a bigger role here than any objective difficulty in mentally parsing either token soup. – StoryTeller - Unslander Monica Feb 26 '23 at 08:08
  • 2
    `ISO C90 forbids variable length array` I would suggest to forget about 30y standards – 0___________ Feb 26 '23 at 08:44
  • @0___________ that is what the Linux kernel puts into the default build flags; the `-Wvla` flag is explicitly present in the list of compiler options. – Grigory Rechistov Feb 27 '23 at 08:14
  • @GrigoryRechistov and .....? Is that the last instance? Is it one of God's commandments? – 0___________ Feb 27 '23 at 10:51
  • 1
    @0___________ "When in Rome, do as Romans do". I have to adapt code for a Linux kernel module. That code base uses its own set of compilation rules. Many of those rules are contested as being archaic or counterproductive. But that is a different topic for another discussion which is held elsewhere. Of course I agree that the way the GCC's `-Wvla` warning is formulated is of questionable value. – Grigory Rechistov Feb 27 '23 at 11:24
  • @GrigoryRechistov you did not mention linux kernek in your question. Put some more effort into your question next time.\ – 0___________ Feb 27 '23 at 11:29
  • @0___________ The fact that a code is ported to Linux really does not change the essence of the question, in my opinion. Surely it might give some weight to the fact that `-Wvla` is added to the list of compilation flags. I was more interested in whether or not I understand the code correctly, because `sizeof(VLA type)` is not something I observe every day. The answers below answer my concerns very nicely! – Grigory Rechistov Feb 27 '23 at 11:38
  • @n.m. Exactly. There are very interesting compilation rules and static analysis passes applied to code compiled as parts of the Linux kernel. It is a fascinating SW world. – Grigory Rechistov Feb 27 '23 at 11:41

3 Answers3

5

No temporary array object is created.

There is a common misconception that VLA is about stack allocated arrays of length defined in runtime, a form of a bit safer alloca().

No. VLAs are about typing not storage. The following line is the essence of "VLA-ness":

typedef int T[n];

not:

int A[n];

Note that VLA type declarations do not allocate any storage and the can be used without any concerns of running out of stack. The VLA types are very useful for handling multidimensional arrays and expressing access ranges in functions arguments i.e. void foo(int n, int arr[n]);. VLA types were optional in C11 but they will be mandatory in C23 due to their usefulness.

The expression sizeof(entry_t[10 - argc] is essentially the same as:

typedef entry_t _unnamed_type[10 - argc];
sizeof(_unnamed_type)

No VLA array object is created there. I think that the problem is -Wvla flag itself. The -Wvla warns about any declaration of VLA type (not VLA object) what is overzealous because it also catches the good usages of VLA types. There is a request to add -Wvla-stack-allocation warning to clang to catch dangerous usages of VLAs. Alternatively, one can use gcc's -Wvla-larger-than=0 but it does not work very well.

tstanisl
  • 13,520
  • 2
  • 25
  • 40
3

In this expression sizeof(entry_t[10 - argc]) no array is created. It evaluates the expression 10 - argc and according to the type of entry_t it calculates the size of such an array. entry_t[10 - argc] is a type specifier and not an expression. So for example you may not write

sizeof entry_t[10 - argc]

These expressions sizeof(entry_t[10 - argc]) and sizeof(entry_t) * (10-argc) yield the same value because the size of an array is equal to the number of elements in the array multiplied by the size of its element provided that 10 is greater than argc.

Regarding the number of elements in a VLA, there is a restriction in the C Standard that is "each time it is evaluated it shall have a value greater than zero."

Note that in the expression sizeof(entry_t) * (10-argc) you can get a very big value of the type size_t when argc is greater than 10 due to the usual arithmetic conversions.

Also this line

long res = pa + sizeof(entry_t[10 - argc]);

raises a question why a value of the unsigned integer type size_t is assigned to a variable of the signed integer type long.

anatolyg
  • 26,506
  • 9
  • 60
  • 134
Vlad from Moscow
  • 301,070
  • 26
  • 186
  • 335
3

Taking sizeof of variable-length array — is there any benefit for doing so?

Benefit with the alterative code

sizeof(entry_t) * (10-argc) has a defined product*1 when argc >= 10. sizeof(entry_t[10 - argc]) is undefined behavior (UB) in that case as it fails to meet "If the size is an expression that is not an integer constant expression: ... each time it is evaluated it shall have a value greater than zero." C17dr § 6.7.6.2 5 (Array declarators)

The next steps of code, converting to long and then to int, have their implementation defined behavior when sizeof(entry_t) * (10-argc) is more than LONG_MAX, INT_MAX.

Small advantage

Given UB of argc >= 10, an optimizing compiler can assume argc < 10 and the emit code taking advantage that the return value will only be in a narrow range (e.g.: [3*4 ... 3*4*9]) - thus only needing to use narrow integer math. Not much of an advantage - but there it is.


*1 Certainly type size_t.

chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • Good point. Replacing one with another is affecting the corner case. What has been an UB before becomes well-defined, although still incorrect from the point of the higher-level logic I am working on. In principle, the runtime could also catch the cases when this UB is reached. In practice, it is not done. So the transformation that gets rid of the VLA type should be beneficial, because it makes code easier to understand. – Grigory Rechistov Feb 27 '23 at 11:30
  • @GrigoryRechistov Your problem reminds me of [February 29, 1900 bug](https://en.wikipedia.org/wiki/Leap_year_problem#Occurrences). A mistake made early on is not fixed to maintain compatibility. – chux - Reinstate Monica Feb 27 '23 at 12:17