17

There is a well known pattern of figuring out array length:

int arr[10]; 
size_t len = sizeof(arr) / sizeof(arr[0]); 
assert(len == 10); 

This pattern applies to static arrays and auto arrays of constant size. It also applies to variable length arrays in C99.

I want to apply similar idea for figuring out dynamic array size in bytes:

size_t known_len = 10; 
int *ptr = malloc(known_len * sizeof(int)); 
size_t size = known_len * sizeof(ptr[0]); 
assert(size == known_len * sizeof(int)); 

This is nicer than known_len * sizeof(int) because sizeof(ptr[0]) doesn't refer to actual array element type. Hence it doesn't require reader of the code to know the type.

However it is unclear to me whether expression sizeof(ptr[0]) can lead to undefined behavior. As it is expanded:

sizeof(ptr[0]) -> sizeof(*((ptr) + (0))) -> sizeof(*ptr) 

The result expression is questionable in case if ptr is 0:

sizeof(*((int*) 0)) 

According to a C99 standard:

(C99, 6.3.2.3p3 ): "An integer constant expression with the value 0, or such an expression cast to type void *, is called a null pointer constant." Dereferencing a null pointer is undefined behavior.

(C99, 6.5.3.2.p4) "If an invalid value has been assigned to the pointer, the behavior of the unary * operator is undefined.87)"

87): "Among the invalid values for dereferencing a pointer by the unary * operator are a null pointer, an address inappropriately aligned for the type of object pointed to, and the address of an object after the end of its lifetime."

But it's never specified whether sizeof of such expression can lead to undefined behavior. In fact such sizeof should be evaluated at compile time.

My questions are:

  • Can the expression sizeof(ptr[0]) be used in the code when type of ptr is known and the value of ptr is not known?
  • Can such use be justified according to C99 standard? GNU GCC spec?
haccks
  • 104,019
  • 25
  • 176
  • 264
mrsmith
  • 291
  • 2
  • 8
  • 2
    see [Is dereferencing null pointer valid in sizeof operation](http://stackoverflow.com/q/19785518/1708801) and [Does not evaluating the expression to which sizeof is applied make it legal to dereference a null or invalid pointer inside sizeof in C++?](http://stackoverflow.com/q/28714018/1708801) – Shafik Yaghmour Jun 10 '15 at 19:49
  • I have added an extra answer to the linked question. My concern here is about situations when arr pointer is declared as, for example, `int n = 10; int (*arr)[n] = NULL;`, i.e. it is a pointer to a VLA. In C language the argument of `sizeof` is evaluated at run-time if it has VLA type (!). The question in this case is whether `sizeof arr[0]` is supposed to produce undefined behavior. Most likely it is. – AnT stands with Russia Jun 10 '15 at 22:03
  • 1
    In fact, the linked question seems to be restricted to non-VLA contexts, while this question does not immediately have such restriction. This makes it a non-duplicate. – AnT stands with Russia Jun 10 '15 at 22:09
  • As an aside, consider `malloc(known_len * sizeof *ptr)` instead of `malloc(known_len * sizeof(int))`. Should the type of `ptr` change, or obscured (buried deep in some `.h` file), this style is more often correct and easier to maintain. – chux - Reinstate Monica Jun 10 '15 at 22:52

5 Answers5

18

The expression ptr[0] will not be evaluated in sizeof(ptr[0]). Size will be determined by just using the type of ptr[0] at compile time.

C11: 6.5.3.4:

The sizeof operator yields the size (in bytes) of its operand, which may be an expression or the parenthesized name of a type. The size is determined from the type of the operand. The result is an integer. If the type of the operand is a variable length array type, the operand is evaluated; otherwise, the operand is not evaluated and the result is aninteger constant.

That means, there is no undefined behavior.

Community
  • 1
  • 1
haccks
  • 104,019
  • 25
  • 176
  • 264
11

This will not cause undefined behavior.

With the exception of taking the size of variable-length arrays, sizeof is a compile-time constant expression. The compiler processes the expression to determine its type, without producing the code to evaluate the expression at compile time. Therefore, the value at ptr[0] (which is an uninitialized pointer) does not matter at all.

Moreover, if you would like to allocate ten integers, you should be calling malloc like this:

int *ptr = malloc(known_len * sizeof(ptr[0])); 

Otherwise, you allocate ten bytes, which is too small for storing ten integers. Note that in the expression above ptr is uninitialized at the time of the call, which is perfectly fine for sizeof.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • In what cases should `sizeof` have to actually *execute* its arguments at run-time? Even when the argument to sizeof involves a VLA, is there any reason `sizeof` should ever have to do anything more than return a hidden variable whose value will be a valid number unless the program has *already* engaged in UB? – supercat Jun 10 '15 at 22:34
  • @dasblinkenlight: My point is that even if the argument to sizeof includes something like a function call or pointer dereference, I see no basis for the function to be executed or the pointer to be dereferenced even in the VLA case. I suppose if one said `sizeof int[foo()]` that might require evaluating `foo`, but I don't see any usefulness to such a construct. – supercat Jun 11 '15 at 00:28
  • @dasblinkenlight: Fun fact: `typedef` can behave as an executable statement(!) A declaration `typedef int arr[i++];` will increment `i` and create a declaration which is only valid for its scope. – supercat Jun 11 '15 at 19:55
  • @supercat Your `typedef` gave me an idea for an example where `sizeof` would be evaluated in a traditional sense, complete with side effects! [Here it is](http://ideone.com/PZnlUY). Curiously, `i` does not get incremented unless I add a second dimension. – Sergey Kalinichenko Jun 11 '15 at 20:11
  • 1
    @dasblinkenlight: I find it bizarre that the Standard as written requires the evaluation of expressions which are *syntactically irrelevant* to the result of the operation. If the only legal use for VLAs were in declarations of variables holding variable-length arrays, even a `sizeof` that wouldn't yield a constant value (as in your example) would never need to do any run-time evaluation of its arguments. Only in very weird cases will the result of a computation in `sizeof` affect its value, and I don't think forbidding such cases would have weakened the expressiveness of the language. – supercat Jun 11 '15 at 20:18
  • 1
    @dasblinkenlight: In your example, there's no semantic reason the compiler should evaluate `i++`; even though the `sizeof` expression isn't constant, the *syntax* of the expression would render the expression `i++` irrelevant. For the value of an expression within `sizeof` to be relevant, it must be used in the *creation of a type* within the expression, as with `sizeof (int[i++])`, and I can't think of any cases where such behavior would actually be useful. – supercat Jun 11 '15 at 21:15
  • @supercat perhaps: `int (*ptr)[f()] = malloc( sizeof(int[n][f()]) );` The compiler can't just decide not to call `f()` (the second one) here because it needs to find the right amount of bytes. So the options seem to be: (a) what the standard says, (b) undefined behaviour, (c) constraint violation. Option (b) seems undesirable. (c) would be nice but perhaps it is not so easy to formulate a rule that implements that intent without also ruling out some valid code. – M.M Feb 13 '16 at 00:30
  • @M.M: In that example, the return value of `f()` is used in the creation of the *type* `int[n][f()]`. A more interesting case might be something like `sizeof (*(char[f()]))`,since the size of a pointer to an array of integers should by independent of the length of the array; I'm not sure whether it would be much value in specifying whether `f()` is evaluated in that case, however. – supercat Feb 13 '16 at 01:46
  • @supercat in your example it's unspecified whether `f()` is called . I think there is some value in having a bettter specification as evinced by the original example this question is about (it's annoying that that is UB as it means you can't use `p = malloc(n * sizeof *p);` for VLAs unless you're happy to pretend that nothing bad will happen for this particular UB – M.M Feb 13 '16 at 01:49
  • @M.M: The prohibition on zero-sized types is IMHO counter-productive. It would have been better to say that just about the only thing specified about the address if a zero-sized array would be that not be a trap representation. One couldn't dereference it, not add or subtract a non-zero value from it, and the pointer may be arbitrarily compare equal to any other (including null) but if code doesn't perform any forbidden actions with the pointer its existence shouldn't cause any sort of UB. – supercat Feb 13 '16 at 02:32
2

In general case, if I'm missing something, dereferencing a null pointer under sizeof can lead to undefined behavior. Since C99, sizeof is not a purely compile time construct. The operand of sizeof is evaluated at run-time if the operand type is a VLA.

Consider the following example

unsigned n = 10;
int (*a)[n] = NULL; // `a` is a pointer to a VLA 

unsigned i = 0;
sizeof a[i++];      // applying `sizeof` to a VLA

According to the C99 standard, the argument of sizeof is supposed to be evaluated (i.e. i is supposed to get incremented). Yet I'm not entirely sure that the null-point dereference in a[0] is supposed to produce undefined behavior here.

AnT stands with Russia
  • 312,472
  • 42
  • 525
  • 765
  • Considering answers for no VLA case, there seem to be no problem with VLA either: void foo(int len) { int vla[len]; printf("%zu\n", sizeof(vla[0])); int *p = vla; printf("%zu\n", sizeof(p[0])); } In first case vla is an variable length array and although sizeof(vla) is evaluated in run time, the size in question is sizeof(vla[0]) still can be computed at compile time. Anyways vla is guaranteed to be not null. In second case p is just a pointer to vla: expression in sizeof(p) must not be evaluated same as in sizeof(p[0]). – mrsmith Jun 11 '15 at 00:22
  • So if "int (*a)[n]" is actually a "pointer to a VLA type int[n]" size of which can't be known compile time. There may be room for UB. – mrsmith Jun 11 '15 at 00:30
  • @mrsmith: Examples in your first comment are not demonstrative, since `sizeof` is not applied to an expression of VLA type. It takes a deliberately crafted expression to produce such a `sizeof` example and `int (*a)[n]` is one way to achieve that. – AnT stands with Russia Jun 11 '15 at 00:49
  • it took me a while to understand what `int (*a)[n]` means. But it is indeed problematic. Your example "works" in GCC 4.1.2 with `-std=c99` and `i` is not being incremented, However I have no idea how to explain it from the standard standpoint. – mrsmith Jun 11 '15 at 02:44
  • in fact `i` is not being incremented even in much simpler case `void baz(int len) { int b[len]; int i = 0; printf("%zu\n", sizeof(b[++i])); printf("%d\n", i); }` will print `4` and `0`. – mrsmith Jun 11 '15 at 02:47
  • @mrsmith: GCC used to contain a bug in this specific behavior, which was eventually fixed. However, I don't remember which versions of GCC had the bug and which version had it fixed. GCC version used by Coliru works properly: http://coliru.stacked-crooked.com/a/fa3f47394f9fa3a4. Note that `i` gets incremented. I suspect that your 4.1.2 is still buggy in that regard. – AnT stands with Russia Jun 11 '15 at 02:49
  • @mrsmith:: Your "much simpler" example (with `int b[len]`) is not relevant to the issue. In that example `sizeof` is applied to an `int` value `b[i++]`. `int` is not VLA. So, the expression is not supposed to be evaluated and `i` is not supposed to be incremented. It is not enough to just "have" a VLA "somewhere" under `sizeof`, it is important to make sure that `sizeof` is *applied directly to VLA type*. Which is why I said that it takes some elaborate crafting to produce a run-time evaluated `sizeof` example. – AnT stands with Russia Jun 11 '15 at 02:52
  • A test in GCC 4.8.3 shows that `i` is properly incremented in my example. – AnT stands with Russia Jun 11 '15 at 03:04
  • Thanks for the explanation! – mrsmith Jun 11 '15 at 04:44
  • @mrsmith: To understand what's going on, it might be helpful to consider the effect of `if (z) typedef int arr[i++];`. One wouldn't normally think a typedef should be an *executable* statement, but the people who added VLAs to the standard apparently thought it should be. Frankly, I think it would have been better to omit VLAs and have a "stalloc" which would behave like "malloc" except that "stalloc" calls would be required to be matched with "stfree" calls in LIFO order, with *defined* behaviors matching malloc/free; any compiler should be able to supply stalloc/stfree implementations... – supercat Jun 11 '15 at 19:14
  • ...by simply saying `#define stalloc(x) malloc(x)` `#define stfree(x) free(x)`, but systems that could use stack memory could achieve a speed boost with code that properly exploited stalloc/stfree. As it is, making `typedef` into an executable statement strikes me as being frankly bizarre. – supercat Jun 11 '15 at 19:16
  • @supercat: There is an `alloca()` call for stack memory allocation. It also doesn't require matching `free()` call. – mrsmith Jun 11 '15 at 19:36
  • @mrsmith: I'm aware of `alloca()`; the fact that it neither requires nor accommodates any sort of "free" causes difficulties in many implementations and also limits its usefulness in certain looping contexts. In any case, having `typedef` behave as an executable statement seems bizarre. I wonder how much had to be changed in gcc to make it accommodate that pattern? – supercat Jun 11 '15 at 20:08
  • @mrsmith the standard says that a subexpression may or may not be evaluated if the result of it does not affect the size – M.M Feb 13 '16 at 00:24
  • My view is that the standard does indeed say `int (*p)[n] = NULL; p = malloc(sizeof p[0]);` is undefined behaviour, however I think the standard should be regarded as defective in this area. The committee probably erred on the side of caution (or the side of "take less time to sort out this unimportant detail") and just said that all VLA arguments to sizeof may be evaluted; as opposed to taking the time to formulate and get feedback on a rule that would say this case must not be evaluated (but other cases obviously do require evaluation). – M.M Feb 13 '16 at 00:35
  • There are an absurd number of situations where the Standard could define behaviors for zero cost on most implementations beyond the fact that compilers would no longer be able to use them as an excuse for dead-code elimination, and where avoiding UB would in require adding code that would serve no other purpose, and which an optimizer wouldn't be able to remove. – supercat Feb 13 '16 at 02:36
1

As an expression (not a declaration), ptr[0] is identical to *ptr.

As long as the type of ptr is known and it is not pointing to void, sizeof is guaranteed to work on sizeof(*ptr) as well as sizeof(ptr[0]).

wallyk
  • 56,922
  • 16
  • 83
  • 148
1

The code is evil.

size_t known_len = 10; 
int *ptr = malloc(known_len); 
size_t size = known_len * sizeof(ptr[0]); 
assert(size == known_len * sizeof(int)); 

The size of the memory allocation from your view point is 10 bytes. That is not 10 pointers to int.

 known_len / sizeof(ptr[0]) 

will give you the number of integers that the allocation can hold.

As is, you are likely to go off the end of the allocation, leading to undefined behavior.

Consider malloc(known_length * sizeof ptr[0]);

EvilTeach
  • 28,120
  • 21
  • 85
  • 141