25

An argument in the comments section of this answer prompted me to ask this question.

In the following code, bar points to a variable length array, so the sizeof is determined at runtime instead of compile time.

int foo = 100;
double (*bar)[foo];

The argument was about whether or not using sizeof evaluates its operand when the operand is a variable length array, making sizeof(*bar) undefined behavior when bar is not initialized.

Is it undefined behavior to use sizeof(*bar) because I'm dereferencing an uninitialized pointer? Is the operand of sizeof actually evaluated when the type is a variable length array, or does it just determine its type (how sizeof usually works)?


Edit: Everyone seems to be quoting this passage from the C11 draft. Does anyone know if this is the wording in the official standard?

PC Luddite
  • 5,883
  • 6
  • 23
  • 39
  • Related post: [Behavior of sizeof on variable length arrays (C only)](http://stackoverflow.com/questions/14995870/behavior-of-sizeof-on-variable-length-arrays-c-only) – juanchopanza Oct 07 '15 at 06:57
  • [sample code](http://ideone.com/XVKPy2) – BLUEPIXY Oct 07 '15 at 08:00
  • Also see [Is dereferencing null pointer valid in sizeof operation](http://stackoverflow.com/q/19785518/1708801) and the C++ version [here](http://stackoverflow.com/q/28714018/1708801). – Shafik Yaghmour Oct 07 '15 at 09:32
  • 2
    @BLUEPIXY The problem though is that if it is undefined behavior, it still might behave as you'd expect (even across platforms and compilers). – PC Luddite Oct 07 '15 at 14:23
  • @ShafikYaghmour Both of those deal with the compile time evaluation of `sizeof`, not the runtime evaluation. – PC Luddite Oct 07 '15 at 14:24
  • It does not need to dereference. So it is not UB. – BLUEPIXY Oct 07 '15 at 19:05
  • 2
    @BLUEPIXY No, it doesn't need to, but the question is really about whether it's undefined behavior according to the standard. – PC Luddite Oct 07 '15 at 19:07
  • Pointer does not make sense to be actually dereference because it has the information of the type pointed to object. need evaluate meant when need evaluate `foo` when `sizeof(double[foo])` – BLUEPIXY Oct 07 '15 at 19:57
  • 1
    @BLUEPIXY It might not make sense to, but that's what the standard currently says. – PC Luddite Oct 07 '15 at 21:20
  • Note that the part to be evaluated is the focus. – BLUEPIXY Oct 07 '15 at 21:32
  • E.g check the type of `*bar` => type is `double[foo]` => evaluate `foo`(the operand is evaluated. However `foo` thing of the time it was declared). – BLUEPIXY Oct 07 '15 at 21:39
  • 2
    @BLUEPIXY The argument is really about what the standard says, not what practically happens. – PC Luddite Oct 07 '15 at 23:05
  • Why Can you speak about the standard? It has no effect to the compiler to determine the type to actually run a de-reference. `sizeof(*bar)` and `sizeof(double[foo])` that's the same I only have to say. – BLUEPIXY Oct 07 '15 at 23:16
  • "The size is determined from the type of the operand.” => type is `double[foo]` => "If the type of the operand is a variable length array type, the operand is evaluated; " This is the same as `sizeof(double[foo])`.it is necessary to evaluate the `foo`. no need to actually dereference In the first process. – BLUEPIXY Oct 07 '15 at 23:30
  • 2
    5.1.2.3/4: _"In the abstract machine, all expressions are evaluated as specified by the semantics. An actual implementation need not evaluate part of an expression if it can deduce that its value is not used and that no needed side effects are produced (including any caused by calling a function or accessing a volatile object)."_ – BLUEPIXY Apr 28 '16 at 06:27
  • 1
    @BLUEPIXY: Yeah, that's why it definitely works in practice on compilers that aren't evil and/or totally stupid. But apparently a DeathStation 9000 C implementation might be *allowed* to break code that depended on it, because of the way the C standard words it. – Peter Cordes Nov 21 '20 at 10:33
  • 1
    There is a proposal that address this issue. See https://www9.open-std.org/JTC1/SC22/WG14/www/docs/n2838.htm – tstanisl Jan 17 '22 at 11:19

3 Answers3

14

Yes, this causes undefined behaviour.

In N1570 6.5.3.4/2 we have:

The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.

Now we have the question: is the type of *bar a variable length array type?

Since bar is declared as pointer to VLA, dereferencing it should yield a VLA. (But I do not see concrete text specifying whether or not it does).

Note: Further discussion could be had here, perhaps it could be argued that *bar has type double[100] which is not a VLA.

Supposing we agree that the type of *bar is actually a VLA type, then in sizeof *bar, the expression *bar is evaluated.

bar is indeterminate at this point. Now looking at 6.3.2.1/1:

if an lvalue does not designate an object when it is evaluated, the behavior is undefined

Since bar does not point to an object (by virtue of being indeterminate), evaluating *bar causes undefined behaviour.

M.M
  • 138,810
  • 21
  • 208
  • 365
  • I am still confused, is the type of `*bar` VLA? Why would it be evaluated if it's type is not VLA? Can you clarify that? – Iharob Al Asimi Oct 07 '15 at 07:03
  • 1
    @iharob well, `double (*bar)[foo];` declares `bar` as pointer to VLA, therefore dereferencing it should give a VLA. It'd be good if we could find concrete text to confirm that – M.M Oct 07 '15 at 07:04
  • So this means that if one does this: `int vla[size]; int *pointer = vla; printf("%zu\n", sizeof(*pointer));` will not be undefined but `printf("%zu\n", sizeof(*vla));` will? – Iharob Al Asimi Oct 07 '15 at 07:04
  • I completely misunderstood this as you can see from the comment above ... Sorry. – Iharob Al Asimi Oct 07 '15 at 07:05
  • 2
    after `int *pointer`, the type of `pointer` is `int *` which is not a VLA type . `*vla` has type `int` which is not a VLA type, so that is not evaluated either. – M.M Oct 07 '15 at 07:05
  • @M.M In my simple case, the compiler may even have optimized `*bar` to be `double[100]`, but there are probably more complex cases where that wouldn't happen. – PC Luddite Oct 07 '15 at 07:12
  • I see you point now. However, I still see no reason to evaluate the operand. C does not store the length of an array in the array, so the compiler has to keep track of the size elsewhere anyway. For `int *i; sizeof(*i)`, there is also no need to evaluate `*i` and that is a well accepted construct. I more suspect some kind of malphrasing: "**the** operand is evaluated" should be "**its** operand ..." (or the like. Note that gcc actually **does** behave as I would expect.) Just a simple thought: **Why would it not suffice only to evaluate the index of the operand, instead of the whole operand?** – too honest for this site Oct 07 '15 at 11:03
  • Note that by "evaluate I mean dereferencing (which includes an access), i.e. runtime code. Of course the compiler has to parse the operand to detect the VLA, as much as for a struct pointer. But this would not be a problem; just the access due to dereferencing would be. – too honest for this site Oct 07 '15 at 11:16
  • 1
    Curious discussion, I haven't ever considered that this could be UB. Reading 6.7.6 declarators, there's this statement: `A full declarator is a declarator that is not part of another declarator. The end of a full declarator is a sequence point. If, in the nested sequence of declarators in a full declarator, there is a declarator specifying a variable length array type, the type specified by the full declarator is said to be variably modified. Furthermore, any type derived by declarator type derivation from a variably modified type is itself variably modified.` – Lundin Oct 07 '15 at 11:39
  • If I understand that correctly, the pointer in this case would be regarded as the type "pointer to a VLA". – Lundin Oct 07 '15 at 11:40
  • 1
    @Olaf "evaluate" and "dereference" are different; e.g. if `p` is a pointer than `p;` evaluates `p` but doesn't dereference it. The difference may be more apparent if you replace `p` with a function that returns a pointer. Then the function is called. – M.M Oct 08 '15 at 08:33
  • @Lundin Yeah. It still seems unclear to me exactly what the type of `*bar` is. Are `double[100] (vla)` and `double[100] (not vla)` different types? – M.M Oct 08 '15 at 08:34
  • @M.M According to the cited text, those two array declarations are definitely different types. For the type system to be consistent, I would then assume that an array pointer to a variably modified array is a type of its own. – Lundin Oct 08 '15 at 10:41
  • @M.M: They have to. It becomes more clear with a 2D VLA. Here, the inner dimension is required to multiply the index of the outer dim. For most architectures, there has to be different code generated and different metadata has to be stored. Actually, also the non-VLAs `int a[3], b[5]` are different types. If boundary checking is available (C11 optional) or for such the compiler also has to store the size for later `sizeof`. – too honest for this site Oct 08 '15 at 11:18
  • @M.M: If we consider "evaluate" does not imply dereferencing it, then where is the problem? It is just the dereferencing which invokes UB. Note that if you have something like `sizeof(f(4))`, `f` need not be called either. It is just the return-type which is given with the declaration of `f`. Note that the expression (i.e. F(4)) is **not** evaluated according to the cited paragraph. And 6.5.3.4p4 does not exclude VLAs either. – too honest for this site Oct 08 '15 at 11:30
  • @Olaf evaluating `f` does not dereference f, but evaluating `*f` does – M.M Oct 09 '15 at 06:09
  • @M.M: With the same explanation `int *f; sizeof(*f);` would have to. But that is perfectly legal according to the cited paragraph and not evaluated. – too honest for this site Oct 09 '15 at 12:09
  • @Olaf I don't know why you keep bringing `int *f; sizeof(*f);` up. `sizeof` does not evaluate the operand when the operand is not a VLA, so `*f` is not evaluated there. – M.M Oct 09 '15 at 12:12
  • 2
    @M.M: I know that very well! And exactly **that** is the question: Why would these two cases have to be treated differently? The VLA expression **has** to be evaluated, but not the (nested) expression yielding the VLA: `*f` need not be evaluated, but only parsed to get the resulting type (as for the `int *`), which is a VLA, which then has to be evaluiated (i.e. the index). If `*f` would not be evaluated, there was no UB. – too honest for this site Oct 09 '15 at 12:16
  • 1
    `*f` yields a VLA. "parsed to get the resulting type" is not part of the standard. Either `*f` is evaluated or it isn't, there is no middle ground. The two cases are treated differently because the standard says that there are two different cases (VLA and not-VLA). – M.M Oct 09 '15 at 12:23
  • 1
    I think that this is the correct answer. If we go by the letter of the standard, this behavior is *technically* undefined. – PC Luddite Oct 10 '15 at 02:35
13

Two other answers have already quoted N1570 6.5.3.4p2:

The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.

According to that paragraph from the standard, yes, the operand of sizeof is evaluated.

I'm going to argue that this is a defect in the standard; something is evaluated at run time, but the operand is not.

Let's consider a simpler example:

int len = 100;
double vla[len];
printf("sizeof vla = %zu\n", sizeof vla);

According to the standard, sizeof vla evaluates the expression vla. But what does that mean?

In most contexts, evaluating an array expression yields the address of the initial element -- but the sizeof operator is an explicit exception to that. We might assume that evaluating vla means accessing the values of its elements, which has undefined behavior since those elements have not been initialized. But there is no other context in which evaluation of an array expression accesses the values of its elements, and absolutely no need to do so in this case. (Correction: If a string literal is used to initialize an array object, the values of the elements are evaluated.)

When the declaration of vla is executed, the compiler will create some anonymous metadata to hold the length of the array (it has to, since assigning a new value to len after vla is defined and allocated doesn't change the length of vla). All that has to be done to determine sizeof vla is to multiply that stored value by sizeof (double) (or just to retrieve the stored value if it stores the size in bytes).

sizeof can also be applied to a parenthesized type name:

int len = 100;
printf("sizeof (double[len]) = %zu\n", sizeof (double[len]));

According to the standard, the sizeof expression evaluates the type. What does that mean? Clearly it has to evaluate the current value of len. Another example:

size_t func(void);
printf("sizeof (double[func()]) = %zu\n", sizeof (double[func()]));

Here the type name includes a function call. Evaluating the sizeof expression must call the function.

But in all of these cases, there's no actual need to evaluate the elements of the array object (if there is one), and no point in doing so.

sizeof applied to anything other than a VLA can be evaluated at compile time. The difference when sizeof is applied to a VLA (either an object or a type) is that something has to be evaluated at run time. But the thing that has to be evaluated is not the operand of sizeof; it's just whatever is needed to determine the size of the operand, which is never the operand itself.

The standard says that the operand of sizeof is evaluated if that operand is of variable length array type. That's a defect in the standard.

Getting back to the example in the question:

int foo = 100;
double (*bar)[foo] = NULL;
printf("sizeof *bar = %zu\n", sizeof *bar);

I've added an initialization to NULL to make it even clearer that dereferencing bar has undefined behavior.

*bar is of type double[foo], which is a VLA type. In principle, *bar is evaluated, which would have undefined behavior since bar is uninitialized. But again, there is no need to dereference bar. The compiler will generate some code when it processes the type double[foo], including saving the value of foo (or foo * sizeof (double)) in an anonymous variable. All it has to do to evaluate sizeof *bar is to retrieve the value of that anonymous variable. And if the standard were updated to define the semantics of sizeof consistently, it would be clear that evaluating sizeof *bar is well defined and yields 100 * sizeof (double) without having to dereference bar.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
  • This is an interesting point of view, but can you really argue that the standard is incorrect? I mean, isn't the standard what makes C, *C*? You could argue certain laws aren't justified or even really practiced, but it's still the law. – PC Luddite Oct 07 '15 at 17:06
  • 1
    @PCLuddite: My argument is that this is a defect in the standard. Taken literally, it imposes a requirement that is not necessary, that makes no sense, and that may be impossible in some cases (there's no definition of what it means to "evaluate" a type name), and that I'm reasonably sure does not reflect the intent of the authors. See [here](http://www.open-std.org/jtc1/sc22/wg14/www/docs/summary.htm) for a list of defect reports against the C11 standard. In any case, I'll re-phrase my answer slightly to say it's a defect rather than that it's "incorrect". – Keith Thompson Oct 07 '15 at 17:57
  • "changing the value of len **doesn't** change the length of vla"? You mean "**does**". Thanks for the text. It actually reflects what I think, but could not have expressed it in English so good. Problem is the standard should have to differentiate between run-time (i.e. accessing items) and compiler-time evaluation (i.e. parsing to get the "meta-data", e.g. type and index-expression. There is an interesting information in gcc doc: ["The length of an array is computed once ... remembered ... sizeof."](https://gcc.gnu.org/onlinedocs/gcc-4.9.3/gcc/Variable-Length.html#Variable-Length). – too honest for this site Oct 07 '15 at 19:34
  • @PCLuddite: That would be neither the first nor the lat paragraph which has to be corrected by a corrigendum. Just read the WG14 documents released **after** release of the C11 standard. – too honest for this site Oct 07 '15 at 19:38
  • 2
    @Olaf: No, I mean that changing the value of `len` *doesn't* change the length of `vla`. I meant that given `int len = 100; int vla[len]; len = 200;`, changing the value of `len` to 200 doesn't affect the length of `vla`. I've updated my answer to clarify that. – Keith Thompson Oct 07 '15 at 20:55
  • I see now. Yes, that is exactly what gcc does it according to the documentation (sorry, had to shorten quite a lot to fit into a comment) and which looks like the most reasonable approach. I wonder if there is a way to send a link to this to the C WG14 standardisation commitee. I did not find any request for clarification for this (they are quite hidden anyway). – too honest for this site Oct 07 '15 at 21:11
  • @Olaf True, but until a correction is released, it's still standard. – PC Luddite Oct 07 '15 at 21:18
  • 2
    @Olaf: I started a discussion on this point on the comp.std.c newsgroup in 2012; the Google Groups archive is [here](https://groups.google.com/forum/#!topic/comp.std.c/hvdN9kWexkw/discussion). It didn't really go anywhere -- and the C standard committee doesn't have any official affiliation with comp.std.c anyway. – Keith Thompson Oct 07 '15 at 21:20
  • 2
    @PCLuddite: Honestly: If the standard contradicts common sense and I use a compiler which actually **does** follow, is widely used, well supported and does not lock me to a specific vendor, I do not wait until the commitee gets things straight they had >16 years (or 12 years, counting time between the last two releases) time (that was the polite version of my actual thoughts). Panta rhei. – too honest for this site Oct 07 '15 at 21:50
  • It seems to me that the root of the defect is : *what is the type of `*bar`* ? (or any expression whose type is an array that was declared as a VLA, but does not actually name that array). – M.M Oct 08 '15 at 08:35
  • 1
    @M.M: I disagree. The type of `*bar` is a VLA. The type is not bound to the name, but the object. But in C, this binding is not stored with the object itself, but only known to the compiler, so there is never need to access the object itself to get its type, but that can always be deduced by parsing the expression and using the sotred meta-data. And this is where the defect is: "evaluate" here should mean "parsing" or "evaluate to get the type", not read the value. Only the (unnecessary) latter actually constitutes UB. – too honest for this site Oct 08 '15 at 11:11
  • 1
    I was about to post the very question of whether it is really UB to use, say `sizeof vla / sizeof vla[0]` when `vla[]` contains only indeterminate values, when I came across this answer. It seems that the committee still hasn't seen fit to address this in the C17 Standard. – ad absurdum Sep 04 '18 at 18:05
  • Why is `*bar` "of type `int[foo]`" if it is being declared as `double *bar[foo]`? Would the type be `double[foo]`? – Omar Darwish Sep 24 '19 at 20:18
  • @OmarDarwish: Yes. I've fixed it. Thanks for finding this error. – Keith Thompson Sep 24 '19 at 23:58
  • I guess the intention of the standard was to address expressions like `sizeof(int[fun()])`, not `typedef int X[foo()]; sizeof(X); X x; sizeof x;`. AFAIK there are no VLA compound literals. Thus it is not possible to form an expression within which VLA type is defined, am I right? If so, then the evaluation is pointless in whenever `sizeof expression` is used because expressions that impact size of VLA are only evaluated when a VLA type is introduced. – tstanisl Apr 20 '21 at 21:46
  • @tstanisl `sizeof(int[fun()])` is an expression within which a VLA type is defined. You're right, there are no VLA compound literals. gcc, clang, and even tcc handle your example `typedef int X[foo()]; sizeof(X); X x; sizeof x;` correctly. I would argue they do so by doing what the standard *should* have said, not what it actually does say. (`foo()` is called when the `typedef` is reached.) – Keith Thompson Apr 20 '21 at 22:14
  • @KeithThompson, by the expression I meant the operand of `sizeof`. `int[fun()]` has no value, it's not even an expression, is it? After a bit of research, I found `*(int(*)[fun()])XXX`. A pointer XXX is cast to newly introduced VM type and dereferenced. To make things even more confusing GCC evaluates `*(int(*)[fun()])XXX` while *not* evaluating `(int(*)[fun()])XXX` and `&*(int(*)[fun()])XXX`. It makes 6.5.3.4p2 even a larger non-sense. – tstanisl Apr 21 '21 at 07:40
  • 1
    There is a proposal that addresses this issue. It must be a defect in the standard. See https://www9.open-std.org/JTC1/SC22/WG14/www/docs/n2838.htm – tstanisl Jan 17 '22 at 11:19
4

Indeed the Standard seems to imply that behaviour be undefined:

re-quoting N1570 6.5.3.4/2:

The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is an integer constant.

I think the wording from the Standard is confusing: the operand is evaluated does not mean that *bar will be evaluated. Evaluating *bar does not in any way help compute its size. sizeof(*bar) does need to be computed at run time, but the code generated for this has no need to dereference bar, it will more likely retrieve the size information from a hidden variable holding the result of the size computation at the time of bar's instantiation.

chqrlie
  • 131,814
  • 10
  • 121
  • 189