How are arrays "implemented" in C?

Question

An array is notably not a pointer. To be sure, both lvalues seem to contain the (1-dimensional) coordinate of some position in (1-dimensional) virtual memory. But consider this example.

#include <stdlib.h>
#include <stdio.h>
int main(){
  char buffer0[4096];
  char* buffer1 = malloc(4096);
  printf("lvalue %16p  sizeof %lu\n", (void *) buffer0, sizeof(buffer0));
  printf("lvalue %16p  sizeof %lu\n", (void *) buffer1, sizeof(buffer1));
// Example output:  lvalue   0x7ffcb70e8620  sizeof 4096
// Example output:  lvalue         0x7a4420  sizeof 8
}

Practical differences that come to mind are:

Arrays know how big (in bytes) they are (and, by extension, they know how many elements they have); pointers don't (but malloc() must know how big a pointer is, to know how much to free() given just the pointer...!)
Arrays are "garbage collected" (no need to free() them); pointers must be freed manually (if they own a non-trivial amount of memory, ie. through malloc())
Arrays "live" in the stack (high virtual memory addresses, at least on my platform); pointers "live" in the heap (low virtual memory addresses)
Arrays decay to pointers when passed to functions
Arrays cannot be resized; pointers can

Overall, arrays seem to be much smarter (but less versatile) than pointers (they know how big they are, how many elements they have, and they have automatic memory management).

Questions

How do arrays "know" how big they are? How is this implemented?
In general, how are arrays implemented in the C language? (Does the compiler do this, or does the kernel?

Arrays don't "know" how big they are or how many elements they contain, but the compiler knows. — molbdnilo, May 21 '18 at 14:05
What is your example about? Yes, `sizeof(buffer0)` is the size of array as a whole, and `buffer1` is merely a pointer (that points to an allocated area of 4096 bytes). `buffer0` cannot be deallocated (but is such automatically as soon as control leaves scope). They are interface-indistinguishable otherwise. — bipll, May 21 '18 at 14:05
I have casted to `(void *)` to prevent [UB](https://stackoverflow.com/questions/2397984/undefined-unspecified-and-implementation-defined-behavior). — Soner from The Ottoman Empire, May 21 '18 at 14:09
*"Arrays cannot be resized; pointers can"* - Not really, they can just be set to point at something else. The size of a pointer, of any type really, is fixed. — StoryTeller - Unslander Monica, May 21 '18 at 14:14
*Arrays are "garbage collected" (no need to free() them); pointers must be freed manually* not true at all. — n. m. could be an AI, May 21 '18 at 14:17
*both lvalues* I really don't like calling an array an lvalue - especially in its bare, unadorned form - as that isn't consistent with the differences between an array and a pointer. A pointer is clearly an lvalue, but I like thinking an array is more like a reference to a *set* of lvalues. A pointer can be assigned to point to an array (or nothing), but an array is different - it doesn't *point* to anything as it's a chunk of memory with independent existence. For example, given `int array[100];` what does `array` contain? `array` itself can't be assigned to (although elements can). — Andrew Henle, May 21 '18 at 14:23

score 8 · Accepted Answer · answered May 21 '18 at 15:47

How do arrays "know" how big they are? How is this implemented?

Arrays don't know how big they are - there is no metadata associated with the array to indicate size (or type, or anything else). During translation, the compiler knows how big the array is, and anything that relies on that knowledge (pointer arithmetic, sizeof operations, etc.) is handled at that time. Once machine code is generated, arrays are just dumb chunks of memory - there's no way to determine at runtime how big an array is by looking at the array object itself (with the exception of variably modified types like variable-length arrays, sizeof operations are computed during translation, not runtime).

In general, how are arrays implemented in the C language? (Does the compiler do this, or does the kernel?

Arrays are nothing more than a contiguous sequence of objects of the same type. For the declaration

T arr[N]; // for any type T

you get

     +---+
arr: |   | arr[0]
     +---+
     |   | arr[1]
     +---+
     |   | arr[2]
     +---+
      ...
     +---+ 
     |   | arr[N-1]
     +---+

There is no arr object independent of the array elements themselves, nor is any metadata set aside anywhere for size, starting address, type, or anything else.

The subscript operation arr[i] is defined as *(arr + i) - given the starting address of the array, offset i elements (not bytes!) from that address and dereference the result.

You are correct that arrays are not pointers - however, unless it is the operand of the sizeof or unary & operators, or is a string literal used to initialize a character array in a declaration, an expression of array type will be converted ("decay") to an expression of pointer type, and the value of the expression will be the address of the first element of the array (again, this is all done during translation, not at runtime).

Thus, when you write something like x = arr[i];, the compiler will convert the expression arr to a pointer value, so the subscript operation works.

By contrast, when you write ap = &arr;, the compiler does not convert arr to a pointer type. The result is still the same as the address of the first element, but the type is different - instead of T *, the type is T (*)[N], or "pointer to N-element array of T".

Do you have reference materials which would further dive into this? Thanks! — Jeel Shah, Aug 02 '18 at 18:45
@JeelShah: [C 2011 Language Definition, online draft](http://www.open-std.org/jtc1/sc22/wg14/www/docs/n1570.pdf). It's not a great *learning* resource for someone who doesn't know the language, but it's the basis for everything I wrote above. To be fair, the language standard doesn't explicitly *prohibit* storing any metadata as part of an array object, but at the same time it doesn't provide any means for accessing said metadata. It would definitely not be in the spirit of C to have such a metadata block. — John Bode, Aug 02 '18 at 18:56

Sourav Ghosh · Answer 2 · 2018-05-21T14:18:40.937

5

How do arrays "know" how big they are? How is this implemented?

The compiler knows this.

In general, how are arrays implemented in the C language? (Does the compiler do this, or does the kernel?

Compiler.

==========================================================================

The point you need to focus here is, array is a type. It is a derived type.

Quoting C11, chapter §6.2.5/P20,

An array type describes a contiguously allocated nonempty set of objects with a particular member object type, called the element type. The element type shall be complete whenever the array type is specified. Array types are characterized by their element type and by the number of elements in the array. [...]

So, inherently, just like the compiler knows about the size of any other type, it knows the size of the array type, also.

The total size is calculated based on the size of the element type multiplied by the number of elements in that array.

edited May 21 '18 at 14:18

answered May 21 '18 at 14:13

Sourav Ghosh

133,132
16
183
261

Of a dynamically allocated array (a contiguous set of objects) the compiler does _not_ know its size. The programmer must keep track of that. And all other arrays, the compiler does not "know" their size in the sense that it can flag out-of-bounds access. Actually the compiler knows hardly anything about the size, and nothing that is helpful to the programmer. – Paul Ogilvie May 21 '18 at 15:09
1

@PaulOgilvie Yes, the compiler knows the size of array variables. And no, programmers do not need to keep track of it - they can just use `sizeof` to get that information. You only need to pass around the size when the array decays into a pointer - not while it's still an array. As for out of bounds accesses: The compiler does know the size of array variables, so it can warn you about out-of-bounds access *if it also knows the value of the index* (and clang, at least, actually does this)... – sepp2k May 21 '18 at 15:54
1

... But if it doesn't know the index (i.e. in most cases), that'd require a runtime check of the index, which compilers don't insert because of its cost. – sepp2k May 21 '18 at 15:54

score 5 · Answer 3 · answered May 21 '18 at 14:15

The type of an array contains its size (as a compile-time constant) and its member type. So since the compiler knows the type of all variables it can just calculate sizeof(the_array) as sizeof(array_type.element_type) * array_type.element_count.
In terms of memory allocation etc. they're simply treated like any other variable:

If you declare an automatic variable of an array type, that adds sizeof(the_array_type) bytes to the size of the stack frame. So when the function is entered, the stack pointer is increased by enough to store the contents of the array, and when the function is exited, it is decreased by the same amount.

If you declare a variable with static duration, sizeof(the_array_type) will be reserved in the binary's data segment.

Again, that's the same way all variables of any type are treated. The important thing is simply that an array contains its elements, so its size is the size of its contents, whereas a pointer merely points to its elements and its size is completely independent of what it points to.

When used as an r-expression outside of sizeof, the name of an array is simply compiled to its address (and typed as a pointer).

Does the compiler do this, or does the kernel?

All of this is done by the compiler.

How are arrays "implemented" in C?

3 Answers3