136

Going through some C interview questions, I've found a question stating "How to find the size of an array in C without using the sizeof operator?", with the following solution. It works, but I cannot understand why.

#include <stdio.h>

int main() {
    int a[] = {100, 200, 300, 400, 500};
    int size = 0;

    size = *(&a + 1) - a;
    printf("%d\n", size);

    return 0;
}

As expected, it returns 5.

edit: people pointed out this answer, but the syntax does differ a bit, i.e. the indexing method

size = (&arr)[1] - arr;

so I believe both questions are valid and have a slightly different approach to the problem. Thank you all for the immense help and thorough explanation!

janojlic
  • 1,299
  • 1
  • 11
  • 10
  • 15
    Well, can't find it, but looks like strictly speaking it is. [Annex J.2](http://port70.net/~nsz/c/c11/n1570.html#J.2) is explicitly stating: *The operand of the unary * operator has an invalid value* is an undefined behavior. Here `&a + 1` is not pointing to any valid object, so it is invalid. – Eugene Sh. May 15 '19 at 18:39
  • 6
    Related: [Is `*((*(&array + 1)) - 1)` safe to use to get the last element of an automatic array?](https://stackoverflow.com/q/32537471/3049655). tl;dr `*(&a + 1)` invokes Undefined Behvaior – Spikatrix May 16 '19 at 06:30
  • 5
    Possible duplicate of [Find size of array without using sizeof in C](https://stackoverflow.com/questions/16019009/find-size-of-array-without-using-sizeof-in-c) – Alma Do May 17 '19 at 08:55
  • @AlmaDo well the syntax does differ a bit, i.e. the indexing part, so I believe that this question is still valid on its own, but I might be wrong. Thank you for pointing it out! – janojlic May 17 '19 at 20:06
  • 1
    @janojlicz They're essentially the same, because `(ptr)[x]` is the same as `*((ptr) + x)`. – S.S. Anne Jun 07 '19 at 19:39

3 Answers3

141

When you add 1 to a pointer, the result is the location of the next object in a sequence of objects of the pointed-to type (i.e., an array). If p points to an int object, then p + 1 will point to the next int in a sequence. If p points to a 5-element array of int (in this case, the expression &a), then p + 1 will point to the next 5-element array of int in a sequence.

Subtracting two pointers (provided they both point into the same array object, or one is pointing one past the last element of the array) yields the number of objects (array elements) between those two pointers.

The expression &a yields the address of a, and has the type int (*)[5] (pointer to 5-element array of int). The expression &a + 1 yields the address of the next 5-element array of int following a, and also has the type int (*)[5]. The expression *(&a + 1) dereferences the result of &a + 1, such that it yields the address of the first int following the last element of a, and has type int [5], which in this context "decays" to an expression of type int *.

Similarly, the expression a "decays" to a pointer to the first element of the array and has type int *.

A picture may help:

int [5]  int (*)[5]     int      int *

+---+                   +---+
|   | <- &a             |   | <- a
| - |                   +---+
|   |                   |   | <- a + 1
| - |                   +---+
|   |                   |   |
| - |                   +---+
|   |                   |   |
| - |                   +---+
|   |                   |   |
+---+                   +---+
|   | <- &a + 1         |   | <- *(&a + 1)
| - |                   +---+
|   |                   |   |
| - |                   +---+
|   |                   |   |
| - |                   +---+
|   |                   |   |
| - |                   +---+
|   |                   |   |
+---+                   +---+

This is two views of the same storage - on the left, we're viewing it as a sequence of 5-element arrays of int, while on the right, we're viewing it as a sequence of int. I also show the various expressions and their types.

Be aware, the expression *(&a + 1) results in undefined behavior:

...
If the result points one past the last element of the array object, it shall not be used as the operand of a unary * operator that is evaluated.

C 2011 Online Draft, 6.5.6/9

Community
  • 1
  • 1
John Bode
  • 119,563
  • 19
  • 122
  • 198
  • 13
    That “shall not be used” text is official: C 2018 6.5.6 8. – Eric Postpischil May 15 '19 at 20:25
  • @EricPostpischil: Do you have a link to the 2018 pre-pub draft (similar to N1570.pdf)? – John Bode May 15 '19 at 20:36
  • 1
    @JohnBode: [This answer](https://stackoverflow.com/a/83763/298225) has [a link to the Wayback Machine](https://web.archive.org/web/20181230041359if_/http://www.open-std.org/jtc1/sc22/wg14/www/abq/c17_updated_proposed_fdis.pdf). I checked the official standard in my purchased copy. – Eric Postpischil May 15 '19 at 20:46
  • 7
    So if one wrote `size = (int*)(&a + 1) - a;` this code would be completely valid? :o – Gizmo May 16 '19 at 11:18
  • @Gizmo they probably originally didn't write that because that way you have to specify the element type; the original was probably written defined as a macro for type-generic use on different element types. – Alex Celeste May 17 '19 at 10:08
  • @Leushenko yeah that much I did figure, as C doesn't have templates/different prototypes based on parameter types. – Gizmo May 17 '19 at 10:32
34

This line is of most importance:

size = *(&a + 1) - a;

As you can see, it first takes the address of a and adds one to it. Then, it dereferences that pointer and subtracts the original value of a from it.

Pointer arithmetic in C causes this to return the number of elements in the array, or 5. Adding one and &a is a pointer to the next array of 5 ints after a. After that, this code dereferences the resulting pointer and subtracts a (an array type that has decayed to a pointer) from that, giving the number of elements in the array.

Details on how pointer arithmetic works:

Say you have a pointer xyz that points to an int type and contains the value (int *)160. When you subtract any number from xyz, C specifies that the actual amount subtracted from xyz is that number times the size of the type that it points to. For example, if you subtracted 5 from xyz, the value of xyz resulting would be xyz - (sizeof(*xyz) * 5) if pointer arithmetic didn't apply.

As a is an array of 5 int types, the resulting value will be 5. However, this will not work with a pointer, only with an array. If you try this with a pointer, the result will always be 1.

Here's a little example that shows the addresses and how this is undefined. The the left-hand side shows the addresses:

a + 0 | [a[0]] | &a points to this
a + 1 | [a[1]]
a + 2 | [a[2]]
a + 3 | [a[3]]
a + 4 | [a[4]] | end of array
a + 5 | [a[5]] | &a+1 points to this; accessing past array when dereferenced

This means that the code is subtracting a from &a[5] (or a+5), giving 5.

Note that this is undefined behavior, and should not be used under any circumstances. Do not expect the behavior of this to be consistent across all platforms, and do not use it in production programs.

S.S. Anne
  • 15,171
  • 8
  • 38
  • 76
27

Hmm, I suspect this is something that would not have worked back in the early days of C. It is clever though.

Taking the steps one at a time:

  • &a gets a pointer to an object of type int[5]
  • +1 gets the next such object assuming there is an array of those
  • * effectively converts that address into type pointer to int
  • -a subtracts the two int pointers, returning the count of int instances between them.

I'm not sure it is completely legal (in this I mean language-lawyer legal - not will it work in practice), given some of the type operations going on. For example you are only "allowed" to subtract two pointers when they point to elements in the same array. *(&a+1) was synthesised by accessing another array, albeit a parent array, so is not actually a pointer into the same array as a. Also, while you are allowed to synthesise a pointer past the last element of an array, and you can treat any object as an array of 1 element, the operation of dereferencing (*) is not "allowed" on this synthesised pointer, even though it has no behaviour in this case!

I suspect that in the early days of C (K&R syntax, anyone?), an array decayed into a pointer much more quickly, so the *(&a+1) might only return the address of the next pointer of type int**. The more rigorous definitions of modern C++ definitely allow the pointer to array type to exist and know the array size, and probably the C standards have followed suit. All C function code only takes pointers as arguments, so the technical visible difference is minimal. But I am only guessing here.

This sort of detailed legality question usually applies to a C interpreter, or a lint type tool, rather than the compiled code. An interpretter might implement a 2D array as an array of pointers to arrays, because there is one less runtime feature to implement, in which case dereferencing the +1 would be fatal, and even if it worked would give the wrong answer.

Another possible weakness may be that the C compiler might align the outer array. Imagine if this was an array of 5 chars (char arr[5]), when the program performs &a+1 it is invoking "array of array" behaviour. The compiler might decide that an array of array of 5 chars (char arr[][5]) is actually generated as an array of array of 8 chars (char arr[][8]), so that the outer array aligns nicely. The code we are discussing would now report the array size as 8, not 5. I'm not saying a particular compiler would definitely do this, but it might.

Gem Taylor
  • 5,381
  • 1
  • 9
  • 27
  • Fair enough. However for reasons hard to explain, everyone uses sizeof()/sizeof() ? – Gem Taylor May 15 '19 at 17:26
  • 5
    Most people do. For example, `sizeof(array)/sizeof(array[0])` gives the number of elements in an array. – S.S. Anne May 15 '19 at 17:28
  • The C compiler is allowed to align the array, but I'm unconvinced it's allowed to change the type of the array after doing so. Alignment would be more realistically implemented by inserting padding bytes. – Kevin May 15 '19 at 18:15
  • @Kevin I'm not saying it would change the type, but by dereferencing the array, the programmer has invoked the "array of array" behaviour, and the alignment of the elements of that outer array is what I question. – Gem Taylor May 15 '19 at 18:19
  • Can you circumvent the dereferencing issue by using `((char *)(&a + 1) - (char *)a) / ((char *)(a + 1) - (char *)a)` or does this just invoke some other kind of undefined behavior? – wrtlprnft May 15 '19 at 19:07
  • @wrtlprnft whrrrrr?????? OK, I can see where you are coming from, but I think you have just written sizeof() :-) – Gem Taylor May 15 '19 at 19:12
  • 2
    Subtracting of pointers is not limited to just two pointers into the same array—the pointers are also allowed to be one past the end of the array. `&a+1` is defined. As John Bollinger notes, `*(&a+1)` is not, since it attempts to dereference an object that does not exist. – Eric Postpischil May 15 '19 at 20:10
  • 5
    A compiler cannot implement a `char [][5]` as `char arr[][8]`. An array is just the repeated objects in it; there is no padding. Additional, this would break the (non-normative) example 2 in C 2018 6.5.3.4 7, which tells us we can compute the number of elements in an array with `sizeof array / sizeof array[0]`. – Eric Postpischil May 15 '19 at 20:15
  • @EricPostpischil , but I does that tell us that `sizeof(array[n])` is the same beast as `array[][n]`? I do know that sizeof()/sizeof() is the traditional way to calculate an array size, and it interesting that that is written into the standard. It is also interesting that I can create odd-sized structs in C (by setting alignment flags sometimes), and you are saying that arrays of them would not be realigned? – Gem Taylor May 16 '19 at 12:46