9

The 2011 standard explicitly states...

6.7.6.2 Array declarators

  1. If the size is an expression that is not an integer constant expression: if it occurs in a declaration at function prototype scope, it is treated as if it were replaced by *; otherwise, each time it is evaluated it shall have a value greater than zero. The size of each instance of a variable length array type does not change during its lifetime. Where a size expression is part of the operand of a sizeof operator and changing the value of the size expression would not affect the result of the operator, it is unspecified whether or not the size expression is evaluated.

It's contrived, but the following code seems reasonable.

size_t vla(const size_t x) {

  size_t a[x];
  size_t y = 0;

  for (size_t i = 0; i < x; i++)
    a[x] = i;

  for (size_t i = 0; i < x; i++)
    y += a[i % 2];

  return y;
}

Clang seems to generate reasonable x64 assembly for it (without optimizations). Obviously indexing a zero length VLA doesn't make sense, but accessing beyond bounds invokes undefined behavior.

Why are zero length arrays undefined?

Jonathan Leffler
  • 730,956
  • 141
  • 904
  • 1,278
Jason
  • 3,777
  • 14
  • 27
  • 6
    C doesn't allow zero-length non-VLAs either; it is consistent to disallow them as VLAs. GCC (and hence clang too) have extensions that allow zero-length arrays. You can debate whether that's good or not. – Jonathan Leffler Oct 26 '15 at 18:05
  • 1
    "Obviously indexing a zero length VLA doesn't make sense, but accessing beyond bounds invokes undefined behavior." - neither of which happening in the example. – Karoly Horvath Oct 26 '15 at 18:11
  • @KarolyHorvath My thoughts were that indexing something zero length is already prohibited. Similar to an empty list, or a zero length vector, having a zero length array makes sense to me as long as the value isn't indexed (which is already prohibited by the language). – Jason Oct 26 '15 at 18:17
  • @JonathanLeffler it is interesting to note that [std::array in C++ does special case](http://stackoverflow.com/q/26209190/1708801) for zero length. – Shafik Yaghmour Oct 26 '15 at 18:19
  • 1
    @Jason, the language doesn't _prohibit_ indexing a zero length array - the syntax allows it! Only the _result_ of accessing outside the bounds is UB. And that is for all arrays, indepenent of type or size. – Paul Ogilvie Oct 26 '15 at 18:40
  • I guess it would be OK for storing an indexed collection of types that are 0 bits long. You would neve have to worry about exceeding its bounds. – Martin James Oct 26 '15 at 18:52
  • "*Why are zero length arrays undefined?*" Because they are of no use? – alk Oct 26 '15 at 19:31
  • @alk It's useful to avoid special casing code for 0. The main problem is that it's a false positive for UB where coding standards are stricter than average. – Jason Oct 26 '15 at 20:17
  • @PaulOgilvie it's normal to describe anything that causes UB as prohibited, illegal, etc. – M.M Oct 26 '15 at 21:14
  • It seems to me that defining behaviour for zero-length arrays would have to introduce a bunch of other crud too. For example, when you use `a` in your code, 6.3.2.1/3 specifies " an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object". But a zero-sized array does not have an initial element, so the decay rule would have to change. Then if you had two zero-sized arrays , could they decay to the same pointer? Etc. – M.M Oct 26 '15 at 21:18
  • @M.M I honestly haven't read the spec with an eye toward all the possible implications of an actual change. When I originally thought about it, it seemed like decaying to a pointer could reasonably resolve to the same address though (easy compiler math). I would guess a non-zero sized VLA could also have the same address as a zero sized VLA, but it seems like indexing the zero sized would already be forbidden. In general everything seems it could still be roughly valid mathematically. – Jason Oct 26 '15 at 21:48
  • @m-m, I can understand "describ[ing] anything that causes UB as prohibited, illegal, etc." In all those discussions I prefer to make a distinction between where the compiler's behavior is undefined and where the usage (in code) causes UB. If code is safe in not using an UB situation, the code is safe and not "prohibited, illegal". Here the compiler's behavior is undefined but the code shown is safe. – Paul Ogilvie Oct 26 '15 at 22:52
  • I understand this is an academical discussion (1+ for this). I however instead of `size_t a[x];` just did `size_t a[x+1];`. ;-P and continued with the "real" problem, whatever it were. – alk Oct 27 '15 at 05:56
  • @alk If `x` were a smaller data type, and/or signed, it could overflow one sooner which *could* be very bad. Special casing can lead to bugs though. – Jason Oct 27 '15 at 15:54

3 Answers3

6
int i = 0;
int a[i], b[i];

Is a == b? It shouldn't be - they're different objects - but avoiding it is problematic. If you leave a gap between a and b unconditionally, you're wasting space in the i > 0 case. If you check whether i == 0 and only leave a gap then, you're wasting time in the i > 0 case.

It gets worse with multidimensional arrays:

int i = 0;
int a[2][i];

You can pad between two variables, but where could you pad here? There's no way to do it without breaking the invariant that sizeof (int[2][i]) == 2 * i * sizeof (int). If you don't pad, then a[0] and a[1] have the same address, and you're breaking a different important invariant.

It's a headache that isn't worth defining.

user2357112
  • 260,549
  • 28
  • 431
  • 505
  • "for positive i would assign the same address to a and b" - was that a brain fart or some typo? it makes no sense... – Karoly Horvath Oct 26 '15 at 18:13
  • @KarolyHorvath: "for positive `i`" attaches to the phrase before it, not the phrase after. Generated code that would be (reasonable and space-efficient for positive `i`) would assign the same address to `a` and `b` for `i == 0`. – user2357112 Oct 26 '15 at 18:15
  • Scratch the whole text and start from zero. It's still a huge mess. – Karoly Horvath Oct 26 '15 at 18:18
  • @KarolyHorvath: Doesn't seem that hard to parse to me, but I wrote it. How about now? – user2357112 Oct 26 '15 at 18:26
  • Now it makes sense ;) Not that it's a big thing, to me it seems like a "mild headache". – Karoly Horvath Oct 26 '15 at 18:34
  • 1
    I like our answer but "_If you leave a gap between a and b unconditionally, you're wasting space in the i > 0 case_" isn't necessarily true if "unconditional" would mean "at least one element" (if you get what I mean). Then there is no waste if i>0. – Paul Ogilvie Oct 26 '15 at 18:35
  • @PaulOgilvie: But that's case 2: checking whether `i == 0`. Assigning an object `max(i, 1)` ints' worth of space is going to be slower than assigning it `i` ints' worth of space. – user2357112 Oct 26 '15 at 18:38
  • "Is `a == b`?" looks like you are asking if array `a` compares to array `b` using code. The arrays are the same size and have the same content, so why shouldn't they be equal? – chux - Reinstate Monica Oct 26 '15 at 18:42
  • @chux, because, as user2357112 says, they are different objects so cannot be the same object, which `a==b` checks. – Paul Ogilvie Oct 26 '15 at 18:43
  • Btw, are there any runtime checks to check that x is zero and abort the program (or raise a signal/exception)? – Paul Ogilvie Oct 26 '15 at 18:44
  • @PaulOgilvie: Well, you could probably get your compiler to emit such checks under debug settings, but under normal conditions, it's unlikely there would be such checks. – user2357112 Oct 26 '15 at 19:07
4

Although we can see that gcc supports zero length arrays an extension, so clearly they are useful. From a standard perspective it would seem to create some issues since as it stands now each object should have a unique address. We can see this from the draft C99 and C11 standard section 6.5.9 Equality operators which says:

Two pointers compare equal if and only if both are null pointers, both are pointers to the same object (including a pointer to an object and a subobject at its beginning) or function, both are pointers to one past the last element of the same array object, or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space.94)

So this would require a bit of special casing and most of the usefulness such as flexibile arrays can be provided using alternative methods.

It would also likely require changes in other places as well, as M.M. points out array to pointer decay in section 6.3.2.1 Lvalues, arrays, and function designators:

[...]an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object and is not an lvalue[...]

This seems like it would require several non-trivial changes for minimal added benefit.

Shafik Yaghmour
  • 154,301
  • 39
  • 440
  • 740
  • "_or one is a pointer to one past the end of one array object and the other is a pointer to the start of a different array object that happens to immediately follow the first array object in the address space_" implies that at least one element must be allocated, however, x can still be zero, functionaly and practically, and the runtime will now allocate one element. UB will follow if we expect a and b to be contiguous in memory and want to compute the address of c which follows b as `c==a+sizeof(a)+sizeof(b)` as a and b are now not zero size (unless ` sizeof` can cope). – Paul Ogilvie Oct 26 '15 at 18:54
  • 1
    How is it different than `malloc(0)`? – Jason Oct 26 '15 at 18:58
  • 1
    @Jason from `7.20.3` *If the size of the space requested is zero, the behavior is implementation defined either: a null pointer is returned, or the behavior is as if the size were some nonzero value, except that the returned pointer shall not be used to access an object* – Shafik Yaghmour Oct 26 '15 at 19:00
  • Thanks. Actually, implementation defined would seem like the more intuitive thing to me. I'm not a compiler author or a language maintainer though. – Jason Oct 26 '15 at 19:02
1

Looking at C standard:

C11- 6.7.6.2 Array declarators (p1):

[...] If the expression is a constant expression, it shall have a value greater than zero. [...]

(p5):

If the size is an expression that is not an integer constant expression: if it occurs in a declaration at function prototype scope, it is treated as if it were replaced by *; otherwise, each time it is evaluated it shall have a value greater than zero. [...]

4. Conformance:

If a "shall" or "shall not" requirement that appears outside of a constraint or runtime- constraint is violated, the behavior is undefined. Undefined behavior is otherwise indicated in this International Standard by the words "undefined behavior" or by the omission of any explicit definition of behavior. There is no difference in emphasis among these three; they all describe "behavior that is undefined".

Therefore, declaring a zero size array leads to undefined behavior of the program.

Community
  • 1
  • 1
haccks
  • 104,019
  • 25
  • 176
  • 264
  • 3
    "Therefore, C does not allow an array to be zero length." - why? There's a huge jump in the explanation there. – Karoly Horvath Oct 26 '15 at 18:19
  • @KarolyHorvath; Didn't I say *In a layman term*? now I am waiting for your well explained answer. – haccks Oct 26 '15 at 18:40
  • 1
    I merely pointed out that I don't understand your explanation. I hope you are well accustomed to infinite busy loops or infinitely blocking API calls because I have no explanation. Not that I need one to post a comment to your fuzzy explanation. – Karoly Horvath Oct 26 '15 at 18:46
  • Do we access `vla[0]` with something like `pointerToObject = &vla[0]`? – Jongware Oct 26 '15 at 19:41
  • 1
    @Jongware; No. We can't. – haccks Oct 26 '15 at 19:51
  • The question is asking why does the standard contain this text, instead of specifying that zero-length VLAs are legal – M.M Nov 05 '15 at 21:53