14

I recently embarrassed myself while explaining to a colleague why

char a[100];
scanf("%s", &a); // notice a & in front of 'a'

is very bad and that the slightly better way to do it is:

char a[100];
scanf("%s", a); // notice no & in front of 'a'  

Ok. For everybody getting ready to tell me why scanf should not be used anyway for security reasons: ease up. This question is actually about the meaning of "&a" vs "a".

The thing is, after I explained why it shouldn't work, we tried it (with gcc) and it works =)). I ran a quick

printf("%p %p", a, &a);

and it prints the same address twice.

Can anybody explain to me what's going on?

user51568
  • 2,623
  • 2
  • 20
  • 23
  • This question can also be used as an experiment in social studies: currently, the top-two answers contain factual errors; let's see if the stackoverflow-magic will fix this or not ;) – Christoph Jan 30 '09 at 18:49
  • Can you explain the factual errors in litb's answer? – user51568 Jan 30 '09 at 18:58
  • @stefan: therefore 'currently' - there was an answer from Adam Rosenfield with 8 upvotes which has already been removed – Christoph Jan 30 '09 at 19:03
  • Could you explain the error in mine? If there is an error, I'd like to fix it. (or just remove the answer entirely) But the stackoverflow-magic won't work if people keep their knowledge secret. ;) – jalf Jan 30 '09 at 20:06
  • @jalf: `&a` is of type `char (*) [100]`, which might have a different bit-representation than `char *`; therefore, an explicit cast must be done by the programmer: for varargs, the compiler can't know to what it must convert! As litb pointed out, no cast to `void *` will be done... – Christoph Jan 30 '09 at 20:44
  • I think I should actually add this to my answer ;) – Christoph Jan 30 '09 at 20:44
  • Whoops, you're right of course. Let's see if the stackoverflow-magic works then... ;) – jalf Jan 30 '09 at 20:53
  • +1'ed yours and litb's answers and added a note to my own. – jalf Jan 30 '09 at 20:54

6 Answers6

18

Well, the &a case should be obvious. You take the address of the array, exactly as expected. a is a bit more subtle, but the answer is that a is the array. And as any C programmer knows, arrays have a tendency to degenerate into a pointer at the slightest provocation, for example when passing it as a function parameter.

So scanf("%s", a) expects a pointer, not an array, so the array degenerates into a pointer to the first element of the array.

Of course scanf("%s", &a) works too, because that's explicitly the address of the array.

Edit: Oops, looks like I totally failed to consider what argument types scanf actually expects. Both cases yield a pointer to the same address, but of different types. (pointer to char, versus pointer to array of chars).

And I'll gladly admit I don't know enough about the semantics for ellipsis (...), which I've always avoided like the plague, so looks like the conversion to whichever type scanf ends up using may be undefined behavior. Read the comments, and litb's answer. You can usually trust him to get this stuff right. ;)

jalf
  • 243,077
  • 51
  • 345
  • 550
  • 1
    Because a itself is not the pointer to the array. It *is* the array. That's also why people often assume that arrays and pointers are the same thing. They aren't. The confusion arises because arrays easily degenerate into pointers, so you can almost always treat them as such. – jalf Jan 30 '09 at 17:20
  • It doesn't? I only have access to a draft of the C++ standard, but from what I could see (5.3.1:2), the & operator is defined for any lvalue, and returns the address of its operand. So why shouldn't it work for the array? Please elaborate. I might be missing something. – jalf Jan 30 '09 at 20:00
11

Well, scanf expects a char* pointer as the next argument when seeing a "%s". But what you give it is a pointer to a char[100]. You give it a char(*)[100]. It's not guaranteed to work at all, because the compiler may use a different representation for array pointers of course. If you turn on warnings for gcc, you will see also the proper warning displayed.

When you provide an argument object that is an argument not having a listed parameter in the function (so, as in the case for scanf when has the vararg style "..." arguments after the format string), the array will degenerate to a pointer to its first element. That is, the compiler will create a char* and pass that to printf.

So, never do it with &a and pass it to scanf using "%s". Good compilers, as comeau, will warn you correctly:

warning: argument is incompatible with corresponding format string conversion

Of course, the &a and (char*)a have the same address stored. But that does not mean you can use &a and (char*)a interchangeably.


Some Standard quotes to especially show how pointer arguments are not converted to void* auto-magically, and how the whole thing is undefined behavior.

Except when it is the operand of the sizeof operator or the unary & operator, or is a string literal used to initialize an array, an expression that has type ‘‘array of type’’ is converted to an expression with type ‘‘pointer to type’’ that points to the initial element of the array object. (6.3.2.1/3)

So, that is done always - it isn't mentioned below explicitly anymore when listening valid cases when types may differ.

The ellipsis notation in a function prototype declarator causes argument type conversion to stop after the last declared parameter. The default argument promotions are performed on trailing arguments. (6.5.2.2/7)

About how va_arg behaves extracting the arguments passed to printf, which is a vararg function, emphasis added by me (7.15.1.1/2):

Each invocation of the va_arg macro modifies ap so that the values of successive arguments are returned in turn. The parameter type shall be a type name specified such that the type of a pointer to an object that has the specified type can be obtained simply by postfixing a * to type. If there is no actual next argument, or if type is not compatible with the type of the actual next argument (as promoted according to the default argument promotions), the behavior is undefined, except for the following cases:

  • one type is a signed integer type, the other type is the corresponding unsigned integer type, and the value is representable in both types;
  • one type is pointer to void and the other is a pointer to a character type.

Well, here is what that default argument promotion is:

If the expression that denotes the called function has a type that does not include a prototype, the integer promotions are performed on each argument, and arguments that have type float are promoted to double. These are called the default argument promotions. (6.5.2.2/6)

Johannes Schaub - litb
  • 496,577
  • 130
  • 894
  • 1,212
  • Can you point me to a reference that clearly states this is undefined behaviour? – user51568 Jan 30 '09 at 17:23
  • @litb: Seems to me you've done some reading as well ;) Could you check if my own conclusions are correct? – Christoph Jan 30 '09 at 18:26
  • i think your answer is fine, christoph. i like how your answer is shorter, but has more information than my bloated one. but i felt the need to quote some more standard stuff since so many ppl seem to agree that passing &a is right :) (and because stefan wanted quote where it says it is undefined :) – Johannes Schaub - litb Jan 30 '09 at 18:34
  • i'm probably poor at doing concise answers :) – Johannes Schaub - litb Jan 30 '09 at 18:36
  • @litb: nothing wrong with these quotes - bringing the standard to the masses is an honourable mission ;) – Christoph Jan 30 '09 at 18:41
  • Nice answer, and looks like I only considered half the question. That'll learn me... ;) And yes, I agree, quoting the standard is definitely a good thing. It's hard to completely trust an answer which doesn't do it. ;) – jalf Jan 30 '09 at 20:51
6

It's been a while since I programmed in C but here's my 2c:

char a[100] doesn't allocate a separate variable for the address of the array, so the memory allocation looks like this:

 ---+-----+---
 ...|0..99|...
 ---+-----+---
    ^
    a == &a

For comparison, if the array was malloc'd then there is a separate variable for the pointer, and a != &a.

char *a;
a = malloc(100);

In this case the memory looks like this:

 ---+---+---+-----+---
 ...| a |...|0..99|...
 ---+---+---+-----+---
    ^       ^
    &a  !=  a

K&R 2nd Ed. p.99 describes it fairly well:

The correspondence between indexing and pointer arithmetic is very close. By definition, the value of a variable or expression of type array is the address of element zero of the array. Thus after the assignment pa=&a[0]; pa and a have identical values. Since the name of the array is a synonym for the location of the initial element, the assignment pa=&a[0] can also be written as pa=a;

WileCau
  • 2,057
  • 1
  • 24
  • 34
  • The memory a is pointing to in the second case being on the heap. – user51568 Jan 30 '09 at 17:38
  • @stefan, you're right about the malloc. In the first case I assume a[] is local so it would be on the stack, but it could be declared globally. IIRC if an array is declared globally it goes on the heap, so it's possible both a's point to somewhere on the heap. – WileCau Jan 30 '09 at 18:45
5

A C array can be implicitly converted to a pointer to its first element (C99:TC3 6.3.2.1 §3), ie there are a lot of cases where a (which has type char [100]) will behave the same way as &a[0] (which has type char *). This explains why passing a as argument will work.

But don't start thinking this will always be the case: There are important differences between arrays and pointers, eg regarding assignment, sizeof and whatever else I can't think of right now...

&a is actually one of these pitfalls: This will create a pointer to the array, ie it has type char (*) [100] (and not char **). This means &a and &a[0] will point to the same memory location, but will have different types.

As far as I know, there is no implicit conversion between these types and they are not guaranteed to have a compatible representation as well. All I could find is C99:TC3 6.2.5 §27, which doesn't says much about pointers to arrays:

[...] Pointers to other types need not have the same representation or alignment requirements.

But there's also 6.3.2.3 §7:

[...] When a pointer to an object is converted to a pointer to a character type, the result points to the lowest addressed byte of the object. Successive increments of the result, up to the size of the object, yield pointers to the remaining bytes of the object.

So the cast (char *)&a should work as expected. Actually, I'm assuming here that the lowest addressed byte of an array will be the lowest addressed byte of its first element - not sure if this is guaranteed, or if a compiler is free to add arbitrary padding in front of an array, but if so, that would be seriously weird...

Anyway for this to work, &a still has to be cast to char * (or void * - the standard guarantees that these types have compatible representations). The problem is that there won't be any conversions applied to variable arguments aside from the default argument promotion, ie you have to do the cast explicitly yourself.


To summarize:

&a is of type char (*) [100], which might have a different bit-representation than char *. Therefore, an explicit cast must be done by the programmer, because for variable arguments, the compiler can't know to what it should convert the value. This means only the default argument promotion will be done, which, as litb pointed out, does not include a conversion to void *. It follows that:

  • scanf("%s", a); - good
  • scanf("%s", &a); - bad
  • scanf("%s", (char *)&a); - should be ok
Christoph
  • 164,997
  • 36
  • 182
  • 240
4

Sorry, a tiny bit off topic:

This reminded me of an article I read about 8 years ago when I was coding C full time. I can't find the article but I think it was titled "arrays are not pointers" or something like that. Anyway, I did come across this C arrays and pointers FAQ which is interesting reading.

ng5000
  • 12,330
  • 10
  • 51
  • 64
0

char [100] is a complex type of 100 adjacent char's, whose sizeof equals to 100.

Being casted to a pointer ((void*) a), this variable yields the address of the first char.

Reference to the variable of this type (&a) yields address of the whole variable, which, in turn, also happens to be the address of the first char

Quassnoi
  • 413,100
  • 91
  • 616
  • 614