136

This is what I found during my learning period:

#include<iostream>
using namespace std;
int dis(char a[1])
{
    int length = strlen(a);
    char c = a[2];
    return length;
}
int main()
{
    char b[4] = "abc";
    int c = dis(b);
    cout << c;
    return 0;
}  

So in the variable int dis(char a[1]) , the [1] seems to do nothing and doesn't work at
all, because I can use a[2]. Just like int a[] or char *a. I know the array name is a pointer and how to convey an array, so my puzzle is not about this part.

What I want to know is why compilers allow this behavior (int a[1]). Or does it have other meanings that I don't know about?

DBedrenko
  • 4,871
  • 4
  • 38
  • 73
Fanl
  • 1,491
  • 2
  • 11
  • 15
  • 7
    That's because you can't actually pass arrays to functions. – Ed S. Mar 27 '14 at 03:03
  • 41
    I think the question here was why C allows you to declare a parameter to be of array type when it is just going to behave exactly like a pointer anyway. – Brian Bi Mar 27 '14 at 03:05
  • `int dis(char (&a)[1])` make complier complains. – songyuanyao Mar 27 '14 at 03:09
  • 8
    @Brian: I'm not sure if this is an argument for or against the behavior, but it also applies if the argument type is a `typedef` with array type. So the "decay to pointer" in argument types isn't just syntactic sugar replacing `[]` with `*`, it's really going through the type system. This has real-world consequences for some standard types like `va_list` that may be defined with array or non-array type. – R.. GitHub STOP HELPING ICE Mar 27 '14 at 03:10
  • @R.. that seems like a good reason, actually – Brian Bi Mar 27 '14 at 03:12
  • @song: Yes, but that requires C++. C doesn't have references. – Ben Voigt Mar 27 '14 at 04:16
  • You can use std::array to enforce array bounds – AlexT Mar 27 '14 at 05:05
  • 4
    @songyuanyao You can accomplish something not entirely dissimilar in C (and C++) using a pointer: `int dis(char (*a)[1])`. Then, you pass a pointer to an array: `dis(&b)`. If you're willing to use C features that don't exist in C++, you can also say things like `void foo(int data[static 256])` and `int bar(double matrix[*][*])`, but that's a whole other can of worms. – Stuart Olsen Mar 27 '14 at 08:17
  • @StuartOlsen I never thought about it. Thanks for the comment. – songyuanyao Mar 27 '14 at 08:29
  • @R.. it could equally be that passing an array really passes the array. Eg `void dis(char a[4])` passes 4 chars, similar to `void dis(char a0, char a1, char a2, char a3)` and similar to passing a struct containing an array. – user253751 Mar 27 '14 at 08:54
  • @immibis From N1570 (C11) §6.7.6.3/7 (Function declarators): "A declaration of a parameter as 'array of _type_' shall be adjusted to 'qualified pointer to _type_…'" The array object is not passed, and in fact it may be that no array object exists; you can pass a pointer to a complete object and, because the parameter type is adjusted to a pointer type, the program will be just as well-formed as if you had passed an array (more precisely, the result of the implicit application of array-to-pointer conversion to an lvalue of array type, i.e. a pointer). – Stuart Olsen Mar 27 '14 at 09:33
  • @StuartOlsen yes, that's what the standard does say, but when they were writing the standard they could just as easily have made it say what I said before. – user253751 Mar 27 '14 at 09:49
  • @immibis If they had given arrays value semantics, it would have severely broken a whole lot of pre-standard code that depended on arrays having pseudo–reference-semantics. Now, as for why K&R originally made arrays work the way they did, your guess is as good as mine. – Stuart Olsen Mar 27 '14 at 10:05
  • 1
    @StuartOlsen The point isn't which standard defined what. The point is why *whoever defined it* defined it that way. – user253751 Mar 27 '14 at 10:06
  • One could make the argument (albeit a weak one) that specifying the array length serves as documentation for the interface. (It may well be the only documentation that many interfaces ever get.) – Hot Licks Mar 28 '14 at 01:38
  • This is actually asking two different questions. First: why can we call `dis` with a `char [4]`, even though the signature of `dis` is `char [1]`? Second question: why can we call `char c = a[2]` even though the type of `a` *appears* to be `char [1]`. (And a third bonus question: Why did I use the word 'appears' in the second question?) – Aaron McDaid Mar 28 '14 at 12:57

10 Answers10

161

It is a quirk of the syntax for passing arrays to functions.

Actually it is not possible to pass an array in C. If you write syntax that looks like it should pass the array, what actually happens is that a pointer to the first element of the array is passed instead.

Since the pointer does not include any length information, the contents of your [] in the function formal parameter list are actually ignored.

The decision to allow this syntax was made in the 1970s and has caused much confusion ever since...

M.M
  • 138,810
  • 21
  • 208
  • 365
  • 23
    As a non-C programmer, I find this answer very accessible. +1 – asteri Mar 27 '14 at 13:39
  • 23
    +1 for "The decision to allow this syntax was made in the 1970s and has caused much confusion ever since..." – NoSenseEtAl Mar 27 '14 at 15:56
  • 10
    this is true but it is also possible to pass an array of __just that size__ using `void foo(int (*somearray)[20])` syntax. in this case 20 is enforced on the caller sites. – v.oddou Mar 28 '14 at 03:11
  • 15
    -1 As a C programmer, I find this answer incorrect. `[]` are not ignored in multidimensional arrays as shown in pat's answer. So including array syntax was necessary. In addition, nothing stops compiler from issuing warnings even on single dimensional arrays. – user694733 Mar 28 '14 at 08:20
  • 7
    By "the contents of your []", I am talking specifically about the code in the Question. This syntax quirk was not necessary at all, the same thing can be achieved by using pointer syntax, i.e. if a pointer is passed then require the parameter to be a pointer declarator. E.g. in pat's example, `void foo(int (*args)[20]);` Also, strictly speaking C does not have multi-dimensional arrays; but it has arrays whose elements can be other arrays. This doesn't change anything. – M.M Mar 28 '14 at 12:18
  • 5
    @MattMcNabb Actually C standard does use the term (N1570.pdf 6.5.2.1.3): *"Successive subscript operators designate an element of a multidimensional array object."*. And while `[]` is not strictly necessary for 1D arrays, it documents the intent better: *"This function takes array of objects instead of single object"*. But since your answer has a word "your", I would remove the -1 if I could. – user694733 Mar 28 '14 at 12:38
  • A good point that it documents intent, although that intent can also be documented via the variable name, or comments, if it isn't clear already from the function usage. – M.M Mar 28 '14 at 12:41
  • Note that static checkers, that use input data domain propagation to look for possible run-time issues, can use this as annotation for an intended constraint although from the compiler ignores it, or even if it would not cause any UB. I'm not sure if the tools I have in mind do use this information - but they really should :-). – Mark A. Apr 14 '16 at 08:20
146

The length of the first dimension is ignored, but the length of additional dimensions are necessary to allow the compiler to compute offsets correctly. In the following example, the foo function is passed a pointer to a two-dimensional array.

#include <stdio.h>

void foo(int args[10][20])
{
    printf("%zd\n", sizeof(args[0]));
}

int main(int argc, char **argv)
{
    int a[2][20];
    foo(a);
    return 0;
}

The size of the first dimension [10] is ignored; the compiler will not prevent you from indexing off the end (notice that the formal wants 10 elements, but the actual provides only 2). However, the size of the second dimension [20] is used to determine the stride of each row, and here, the formal must match the actual. Again, the compiler will not prevent you from indexing off the end of the second dimension either.

The byte offset from the base of the array to an element args[row][col] is determined by:

sizeof(int)*(col + 20*row)

Note that if col >= 20, then you will actually index into a subsequent row (or off the end of the entire array).

sizeof(args[0]), returns 80 on my machine where sizeof(int) == 4. However, if I attempt to take sizeof(args), I get the following compiler warning:

foo.c:5:27: warning: sizeof on array function parameter will return size of 'int (*)[20]' instead of 'int [10][20]' [-Wsizeof-array-argument]
    printf("%zd\n", sizeof(args));
                          ^
foo.c:3:14: note: declared here
void foo(int args[10][20])
             ^
1 warning generated.

Here, the compiler is warning that it is only going to give the size of the pointer into which the array has decayed instead of the size of the array itself.

pat
  • 12,587
  • 1
  • 23
  • 52
  • Very useful - consistency with this is also plausible as the reason for the quirk in the 1-d case. – jwg Mar 27 '14 at 10:35
  • 1
    It is the same idea as the 1-D case. What looks like a 2-D array in C and C++ is actually a 1-D array, each element of which is another 1-D array. In this case we have an array with 10 elements, each element of which is "array of 20 ints". As described in my post, what actually gets passed to the function is the pointer to the first element of `args`. In this case, the first element of args is an "array of 20 ints". Pointers include type information; what gets passed is "pointer to an array of 20 ints". – M.M Mar 27 '14 at 21:40
  • 9
    Yup, that's what the `int (*)[20]` type is; "pointer to an array of 20 ints". – pat Mar 27 '14 at 21:54
  • @pat You said we can omit only first dimension but not other dimensions then why is this code running without any error or warning CODE link: https://ide.geeksforgeeks.org/WMoKbsYhB8 Please explain. Am I missing something? – Vinay Yadav Aug 27 '20 at 15:22
  • The type of `int (*p)[]` is a pointer to a 1-dimensional array of indeterminate length. The size of `*p` is undefined, so you cannot index `p` directly (even with an index of `0`!). The only thing you can do with `p` is to dereference it as `*p`, and then index it as `(*p)[i]`. This does not preserve the 2-dimensional structure of the original array. – pat Aug 30 '20 at 18:57
33

The problem and how to overcome it in C++

The problem has been explained extensively by pat and Matt. The compiler is basically ignoring the first dimension of the array's size effectively ignoring the size of the passed argument.

In C++, on the other hand, you can easily overcome this limitation in two ways:

  • using references
  • using std::array (since C++11)

References

If your function is only trying to read or modify an existing array (not copying it) you can easily use references.

For example, let's assume you want to have a function that resets an array of ten ints setting every element to 0. You can easily do that by using the following function signature:

void reset(int (&array)[10]) { ... }

Not only this will work just fine, but it will also enforce the dimension of the array.

You can also make use of templates to make the above code generic:

template<class Type, std::size_t N>
void reset(Type (&array)[N]) { ... }

And finally you can take advantage of const correctness. Let's consider a function that prints an array of 10 elements:

void show(const int (&array)[10]) { ... }

By applying the const qualifier we are preventing possible modifications.


The standard library class for arrays

If you consider the above syntax both ugly and unnecessary, as I do, we can throw it in the can and use std::array instead (since C++11).

Here's the refactored code:

void reset(std::array<int, 10>& array) { ... }
void show(std::array<int, 10> const& array) { ... }

Isn't it wonderful? Not to mention that the generic code trick I've taught you earlier, still works:

template<class Type, std::size_t N>
void reset(std::array<Type, N>& array) { ... }

template<class Type, std::size_t N>
void show(const std::array<Type, N>& array) { ... }

Not only that, but you get copy and move semantic for free. :)

void copy(std::array<Type, N> array) {
    // a copy of the original passed array 
    // is made and can be dealt with indipendently
    // from the original
}

So, what are you waiting for? Go use std::array.

Community
  • 1
  • 1
Shoe
  • 74,840
  • 36
  • 166
  • 272
  • 2
    @kietz, I'm sorry your suggested edit got rejected, but we [automatically assume C++11 is being used](http://meta.stackexchange.com/a/112650/152998), unless specified otherwise. – Shoe Mar 27 '14 at 20:30
  • this is true, but we are also supposed to specify if any solution is C++11 only, based on the link you gave. – trlkly Mar 28 '14 at 02:26
  • @trlkly, I agree. I've edited the answer accordingly. Thanks for pointing it out. – Shoe Mar 28 '14 at 15:22
9

It's a fun feature of C that allows you to effectively shoot yourself in the foot if you're so inclined.

I think the reason is that C is just a step above assembly language. Size checking and similar safety features have been removed to allow for peak performance, which isn't a bad thing if the programmer is being very diligent.

Also, assigning a size to the function argument has the advantage that when the function is used by another programmer, there's a chance they'll notice a size restriction. Just using a pointer doesn't convey that information to the next programmer.

Logan Wayne
  • 6,001
  • 16
  • 31
  • 49
bill
  • 176
  • 4
  • 3
    Yes. C is designed to trust the programmer over the compiler. If you are so blatantly indexing of the end of an array, you must be doing something special and intentional. – John Mar 28 '14 at 02:54
  • 7
    I cut my teeth in programming on C 14 years ago. Of all my professor said, the one phrase that has stuck with me more than all others, "C was written by programmers, for programmers." The language is extremely powerful. (Prepare for cliche) As uncle Ben taught us, "With great power, comes great responsibility." – Andrew Falanga Mar 28 '14 at 16:26
8

First, C never checks array bounds. Doesn't matter if they are local, global, static, parameters, whatever. Checking array bounds means more processing, and C is supposed to be very efficient, so array bounds checking is done by the programmer when needed.

Second, there is a trick that makes it possible to pass-by-value an array to a function. It is also possible to return-by-value an array from a function. You just need to create a new data type using struct. For example:

typedef struct {
  int a[10];
} myarray_t;

myarray_t my_function(myarray_t foo) {

  myarray_t bar;

  ...

  return bar;

}

You have to access the elements like this: foo.a[1]. The extra ".a" might look weird, but this trick adds great functionality to the C language.

user34814
  • 135
  • 1
  • 8
5

To tell the compiler that myArray points to an array of at least 10 ints:

void bar(int myArray[static 10])

A good compiler should give you a warning if you access myArray [10]. Without the "static" keyword, the 10 would mean nothing at all.

gnasher729
  • 51,477
  • 5
  • 75
  • 98
  • 1
    Why should a compiler warn if you access the 11th element and the array contains *at least* 10 elements? – nwellnhof Mar 27 '14 at 11:52
  • Presumably this is because the compiler can only enforce that you have _at least_ 10 elements. If you try to access the 11th element, it cannot be **sure** that it exists (even though it may). – Dylan Watson Mar 27 '14 at 14:21
  • The compiler can rely on getting an array with 10 elements. It is legal to pass in an array with 11 or more elements. The function isn't allowed to use them beyond the first ten. – gnasher729 Mar 27 '14 at 14:24
  • 2
    I don't think that's a correct reading of the standard. `[static]` allows the compiler to warn if you *call* `bar` with an `int[5]`. It doesn't dictate what you may access *within* `bar`. The onus is entirely on the caller side. – tab Mar 27 '14 at 18:18
  • 3
    `error: expected primary-expression before 'static'` never seen this syntax. this is unlikely to be standard C or C++. – v.oddou Mar 28 '14 at 03:09
  • 3
    @v.oddou, it's specified in C99, in 6.7.5.2 and 6.7.5.3. – Samuel Edwin Ward Mar 28 '14 at 14:48
  • @SamuelEdwinWard: oh nice, so it means its C only. – v.oddou Mar 31 '14 at 00:22
5

This is a well-known "feature" of C, passed over to C++ because C++ is supposed to correctly compile C code.

Problem arises from several aspects:

  1. An array name is supposed to be completely equivalent to a pointer.
  2. C is supposed to be fast, originally developerd to be a kind of "high-level Assembler" (especially designed to write the first "portable Operating System": Unix), so it is not supposed to insert "hidden" code; runtime range checking is thus "forbidden".
  3. Machine code generrated to access a static array or a dynamic one (either in the stack or allocated) is actually different.
  4. Since the called function cannot know the "kind" of array passed as argument everything is supposed to be a pointer and treated as such.

You could say arrays are not really supported in C (this is not really true, as I was saying before, but it is a good approximation); an array is really treated as a pointer to a block of data and accessed using pointer arithmetic. Since C does NOT have any form of RTTI You have to declare the size of the array element in the function prototype (to support pointer arithmetic). This is even "more true" for multidimensional arrays.

Anyway all above is not really true anymore :p

Most modern C/C++ compilers do support bounds checking, but standards require it to be off by default (for backward compatibility). Reasonably recent versions of gcc, for example, do compile-time range checking with "-O3 -Wall -Wextra" and full run-time bounds checking with "-fbounds-checking".

ZioByte
  • 2,690
  • 1
  • 32
  • 68
  • Maybe C++ *was* supposed to compile C code 20 years ago, but it certainly *is* not, and hasn't for a long time (C++98? C99 at least, which has not been "fixed" by any newer C++ standard). – hyde Apr 02 '14 at 04:13
  • @hyde That sounds a bit too harsh to me. To quote Stroustrup "With minor exceptions, C is a subset of C++." (The C++ PL 4th ed., sec. 1.2.1). While both C++ and C evolve further, and features from the latest C version exist which are not in the latest C++ version, overall I think that Stroustrup quote is still valid. – mvw Apr 02 '14 at 13:10
  • @mvw Most C code written in this millenium, which is not intentionally kept C++ compatible by avoiding incompatible features, will use the C99 *designated initializers* syntax (`struct MyStruct s = { .field1 = 1, .field2 = 2 };`) for initializing structs, because it is just so much clearer way to initialize a struct. As a result, most current C code will be rejected by standard C++ compilers, because most C code will be initializing structs. – hyde Apr 02 '14 at 13:29
  • @mvw It could perhaps be said, that C++ is supposed to be compatible with C so, that it is possible to write code which will compile with both C and C++ compilers, if certain compromises are made. But that requires using a subset of *both* C and C++, not just subset of C++. – hyde Apr 02 '14 at 13:32
  • @hyde You would be surprised how much of C code is C++ compilable. A few years years ago the whole Linux kernel was C++ compilable (I do not know if it still holds true). I routinely compile C code in C++ compiler to get a superior warning checking, only "production" is compiled in C mode to squeeze the most optimization. – ZioByte Apr 10 '14 at 13:39
  • @ZioByte A lot of C code is still C90 compilable... Anyway, about using C++ compilers for better warnings, some have extensions which allow certain C features (such as VLAs) to be used in C++, so that can be useful even if code uses features missing from standard C++. – hyde Apr 10 '14 at 18:37
3

C will not only transform a parameter of type int[5] into *int; given the declaration typedef int intArray5[5];, it will transform a parameter of type intArray5 to *int as well. There are some situations where this behavior, although odd, is useful (especially with things like the va_list defined in stdargs.h, which some implementations define as an array). It would be illogical to allow as a parameter a type defined as int[5] (ignoring the dimension) but not allow int[5] to be specified directly.

I find C's handling of parameters of array type to be absurd, but it's a consequence of efforts to take an ad-hoc language, large parts of which weren't particularly well-defined or thought-out, and try to come up with behavioral specifications that are consistent with what existing implementations did for existing programs. Many of the quirks of C make sense when viewed in that light, particularly if one considers that when many of them were invented, large parts of the language we know today didn't exist yet. From what I understand, in the predecessor to C, called BCPL, compilers didn't really keep track of variable types very well. A declaration int arr[5]; was equivalent to int anonymousAllocation[5],*arr = anonymousAllocation;; once the allocation was set aside. the compiler neither knew nor cared whether arr was a pointer or an array. When accessed as either arr[x] or *arr, it would be regarded as a pointer regardless of how it was declared.

supercat
  • 77,689
  • 9
  • 166
  • 211
1

One thing that hasn't been answered yet is the actual question.

The answers already given explain that arrays cannot be passed by value to a function in either C or C++. They also explain that a parameter declared as int[] is treated as if it had type int *, and that a variable of type int[] can be passed to such a function.

But they don't explain why it has never been made an error to explicitly provide an array length.

void f(int *); // makes perfect sense
void f(int []); // sort of makes sense
void f(int [10]); // makes no sense

Why isn't the last of these an error?

A reason for that is that it causes problems with typedefs.

typedef int myarray[10];
void f(myarray array);

If it were an error to specify the array length in function parameters, you would not be able to use the myarray name in the function parameter. And since some implementations use array types for standard library types such as va_list, and all implementations are required to make jmp_buf an array type, it would be very problematic if there were no standard way of declaring function parameters using those names: without that ability, there could not be a portable implementation of functions such as vprintf.

0

It's allowed for compilers to be able to check whether the size of array passed is the same as what expected. Compilers may warn an issue if it's not the case.

hamidi
  • 1,611
  • 1
  • 15
  • 28