7

I really need help on this.It has shaken my very foundation in C.Long and detailed answers will be very much appreciated.I have divided my question into two parts.

A: Why does printf("%s",(char[]){'H','i','\0'}); work and prints Hi just as the conventional printf("%s","Hi"); does?Can we use (char[]){'H','i','\0'} as a substitute for "Hi" anywhere in our C code?Do they mean the same?I mean,when we write "Hi" in C,it generally means Hi is stored somewhere in memory and a pointer to it is passed.Can the same be said of the seemingly ugly (char[]){'H','i','\0'}.Are they exactly same?

B: When printf("%s",(char[]){'H','i','\0'}) works successfully,the same as printf("%s","Hi"),why then printf("%s",(char*){'A','B','\0'} fails big time and seg-faults if I run it despite the warnings? It just amazes me, because ,in C, isn't char[] supposed to decompose into char* ,like when we pass it in function arguments,why then it is not doing so here and char* gives failure?I mean, isn't passing char demo[] as an argument to a function same as char demo*?Why then the results are not same here?

Please help me out on this.I feel like I haven't even yet understood the very basics of C.I am very disappointed.Thank you!!

Thokchom
  • 1,602
  • 3
  • 17
  • 32

5 Answers5

8

Your third example:

printf("%s",(char *){'H','i','\0'});

isn't even legal (strictly speaking it's a constraint violation), and you should have gotten at least one warning when compiling it. When I compiled it with gcc with default options, I got 6 warnings:

c.c:3:5: warning: initialization makes pointer from integer without a cast [enabled by default]
c.c:3:5: warning: (near initialization for ‘(anonymous)’) [enabled by default]
c.c:3:5: warning: excess elements in scalar initializer [enabled by default]
c.c:3:5: warning: (near initialization for ‘(anonymous)’) [enabled by default]
c.c:3:5: warning: excess elements in scalar initializer [enabled by default]
c.c:3:5: warning: (near initialization for ‘(anonymous)’) [enabled by default]

The second argument to printf is a compound literal. It's legal (but odd) to have a compound literal of type char*, but in this case the initializer-list portion of the compound literal is invalid.

After printing the warnings, what gcc seems to be doing is (a) converting the expression 'H', which is of type int, to char*, yielding a garbage pointer value, and (b) ignoring the remainder of the initializer elements, 'i' and '\0'. The result is a char* pointer value that points to the (probably virtual) address 0x48 -- assuming an ASCII-based character set.

Ignoring excess initializers is valid (but worthy of a warning), but there is no implicit conversion from int to char* (apart from the special case of a null pointer constant, which doesn't apply here). gcc has done its job by issuing a warning, but it could (and IMHO should) have rejected it with a fatal error message. It will do so with the -pedantic-errors option.

If your compiler warned you about those lines, you should have included those warnings in your question. If it didn't, either crank up the warning level or get a better compiler.

Going into more detail about what happens in each of the three cases:

printf("%s","Hi");

A C string literal like "%s" or "Hi" creates an anonymous statically allocated array of char. (This object is not const, but attempting to modify it has undefined behavior; this isn't ideal, but there are historical reasons for it.) A terminating '\0' null character is added to make it a valid string.

An expression of array type, in most contexts (the exceptions are when it's the operand of the unary sizeof or & operator, or when it's a string literal in an initializer used to initialize an array object) is implicitly converted to ("decays to") a pointer to the array's first element. So the two arguments passed to printf are of type char*; printf uses those pointers to traverse the respective arrays.

printf("%s",(char[]){'H','i','\0'});

This uses a feature that was added to the language by C99 (the 1999 edition of the ISO C standard), called a compound literal. It's similar to a string literal, in that it creates an anonymous object and refers to the value of that object. A compound literal has the form:

( type-name ) { initializer-list }

and the object has the specified type and is initialized to the value given by the initializer list.

The above is nearly equivalent to:

char anon[] = {'H', 'i', '\0'};
printf("%s", anon);

Again, the second argument to printf refers to an array object, and it "decays" to a pointer to the array's first element; printf uses that pointer to traverse the array.

Finally, this:

printf("%s",(char*){'A','B','\0'});

as you say, fails big time. The type of a compound literal is usually an array or structure (or union); it actually hadn't occurred to me that it could be a scalar type such as a pointer. The above is nearly equivalent to:

char *anon = {'A', 'B', '\0'};
printf("%s", anon);

Obviously anon is of type char*, which is what printf expects for a "%s" format. But what's the initial value?

The standard requires the initializer for a scalar object to be a single expression, optionally enclosed in curly braces. But for some reason, that requirement is under "Semantics", so violating it is not a constraint violation; it's merely undefined behavior. That means the compiler can do anything it likes, and may or may not issue a diagnostic. The authors of gcc apparently decided to issue a warning and ignore all but the first initializer in the list.

After that, it becomes equivalent to:

char *anon = 'A';
printf("%s", anon);

The constant 'A' is of type int (for historical reasons, it's int rather than char, but the same argument would apply either way). There is no implicit conversion from int to char*, and in fact the above initializer is a constraint violation. That means a compiler must issue a diagnostic (gcc does), and may reject the program (gcc doesn't unless you use -pedantic-errors). Once the diagnostic is issued, the compiler can do whatever it likes; the behavior is undefined (there's some language-lawyerly disagreement on that point, but it doesn't really matter). gcc chooses to convert the value of A from int to char* (probably for historical reasons, going back to when C was even less strongly typed than it is today), resulting in a garbage pointer with a representation that probably looks like 0x00000041 or 0x0000000000000041`.

That garbage pointer is then passed to printf, which tries to use it to access a string at that location in memory. Hilarity ensues.

There are two important things to keep in mind:

  1. If your compiler prints warnings, pay close attention to them. gcc in particular issues warnings for many things that IMHO should be fatal errors. Never ignore warnings unless you understand what the warning means, thoroughly enough for your knowledge to override that of the authors of the compiler.

  2. Arrays and pointers are very different things. Several rules of the C language seemingly conspire to make it look like they're the same. You can temporarily get away with assuming that arrays are nothing more than pointers in disguise, but that assumption will eventually come back to bite you. Read section 6 of the comp.lang.c FAQ; it explains the relationship between arrays and pointers better than I can.

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
  • I got the **exact** warnings as you mentioned in your answer!! – Thokchom May 17 '13 at 19:12
  • It's really unfortunate I was away when you answered.I intended to ask this : Why does `(char[]){'H','i','\0'}` work?`%s` expects an argument of type `char*`.So shall I conclude `(char[]){'H','i','\0'}` decomposes to `char*` eventually?And finally,Is `(char[]){'H','i','\0'}` the **exact** substitute for `"Hi"`? – Thokchom May 17 '13 at 19:16
  • I must stress what I intend to ask in the second part of my comment above.Does ` `(char[]){'H','i','\0'}`` mean the string `"Hi"` in all context in a C code?Like strlen(),sizeof,strcpy()? – Thokchom May 17 '13 at 19:18
  • @Thokchom nope, it does not. Usually, `"Hi"` will be a pointer to a string allocated in static data (thus, a `const` string), whilst the compound literal will be always be allocated on the stack and it will be mutable. – Richard J. Ross III May 17 '13 at 19:21
  • @RichardJ.RossIII So do you mean `(char[]){'H','i','\0'}` translates to type `char*` while `"Hi"` translates to type `const char*`? – Thokchom May 17 '13 at 19:28
  • @RichardJ.RossIII `(char[]){'H','i','\0'}` does translate to type `char*` right?How else would it be eligible as an argument for `%s`? – Thokchom May 17 '13 at 19:29
  • @Thokchom no, the compound literal (e.g. `(char[]) { ... }`) is an *array*, which then decays to a pointer when passed to printf. `"..."` is just a raw pointer, and only becomes an array if assigned to one. – Richard J. Ross III May 17 '13 at 19:34
  • @RichardJ.RossIII By raw pointer you mean raw pointer of type `char*` eh?Or `const char*` as `"Hi"` is,as you said before about `"Hi"`? – Thokchom May 17 '13 at 19:36
  • @Thokchom the raw pointer in that case was of type `char *`, but modifying it is undefined behavior. It should almost always be assigned to a variable of type `const char *`, but the literal itself is not const by default. C's rules regarding strings are some of the most complex ones out there, and as such it becomes hard to follow at some point. When in doubt, `char *` is const, while `char []` isn't. – Richard J. Ross III May 17 '13 at 19:38
  • @RichardJ.RossIII Please come to chat for a minute. – Thokchom May 17 '13 at 19:40
  • let us [continue this discussion in chat](http://chat.stackoverflow.com/rooms/30152/discussion-between-thokchom-and-richard-j-ross-iii) – Thokchom May 17 '13 at 19:40
  • I like to think of a pointer as the address of something in memory. Its a concept inheirent to the hardware of the machine itself and each machine may handle it slightly differently but its all similar. Then you have to map what C is doing against that because C is a "compiler" and generates the raw instructions for the machine. – Lee Meador May 17 '13 at 19:41
  • I don't expect an explanation for the downvote, but I'd appreciate it. – Keith Thompson May 17 '13 at 22:17
7

Regarding snippet #2:

The code works because of a new feature in C99, called compound literals. You can read about them in several places, including GCC's documentation, Mike Ash's article, and a bit of google searching.

Essentially, the compiler creates a temporary array on the stack, and fills it with 3 bytes - 0x48, 0x69, and 0x00. That temporary array once created, is then decayed to a pointer and passed to the printf function. A very important thing to note about compound literals is that they are not const by default, like most C-strings.

Regarding snippet #3:

You're actually not creating an array at all - you are casting the first element in the scalar intializer, which, in this case is H, or 0x48 into a pointer. You can see that by changing the %s in your printf statement into a %p, which gives this output for me:

0x48

As such, you must be very careful with what you do with compound literals - they're a powerful tool, but it's easy to shoot yourself in the foot with them.

Richard J. Ross III
  • 55,009
  • 24
  • 135
  • 201
  • +1 for the answer.Please don't mind the rollback.I could barely recall what I intended to ask as the question was in my own words and there were some nuances. – Thokchom May 17 '13 at 19:07
  • Please look into what I asked to **Keith Thompson** under his answer.I'll be nice of you if you can clarify that as well. – Thokchom May 17 '13 at 19:19
3

(Ok ... someone reworked the question completely. Reworking the answer.)

The #3 array contains the hex bytes. (We don't know about that 4th one):

48 49 00 xx

When it passes the contents of that array, in the 2nd case only, it takes those bytes as the address of the string to print. It depends on how those 4 bytes convert to a pointer in your actual CPU hardware but lets say it says "414200FF" is the address (since we'll guess the 4th byte is an 0xFF. We are making this all up anyway.) We are also assuming a pointer is 4 bytes long and an endian order and stuff like that. It doesn't matter to the answer but others are free to expound.

Note: One of the other answers seems to think it takes the 0x48 and extends it to an (int) 0x00000048 and calls that a pointer. Could be. But if GCC did that, and @KiethThompson didn't say he checked the generated code, it doesn't mean some other C compiler would do the same thing. The result is the same either way.

That gets passed to the printf() function and it tries to go to that address to get some characters to print. (Seg fault happens because that address maybe isn't present on the machine and isn't assigned to your process for reading anyway.)

In case #2 it knows its an array and not a pointer so it passes the address of the memory where the bytes are stored and printf() can do that.

See other answers for more formal language.

One thing to think about is that at least some C compiler probably doesn't know a call to printf from a call to any other function. So it takes the "format string" and stores away a pointer for the call (which happens to be to a string) and then takes the 2nd parameter and stores away whatever it gets according to the declaration of the function, whether an int or a char or a pointer for the call. The function then pulls these out of wherever the caller puts them according to that same declaration. The declaration for the 2nd and greater parameters has to be something really generic to be able to accept pointer, int, double and all the different types that could be there. (What I'm saying is the compiler probably doesn't look at the format string when deciding what to do with the 2nd and following parameters.)

It might be interesting to see what happens for:

printf("%s",{'H','i','\0'});
printf("%s",(char *)(char[]){'H','i','\0'}); // This works according to @DanielFischer

Predictions?

Lee Meador
  • 12,829
  • 2
  • 36
  • 42
  • `printf("%s",(char *)(char[]){'H','i','\0'});` will work. You're casting a `char[]` (the compound literal) to a `char*` [which conversion would automatically be done anyway], completely valid, no problem. – Daniel Fischer May 17 '13 at 18:04
  • @DanielFischer I need some further clarifications that I failed to mention clearly in my question.I have mentioned those as comments under Keith's answer.Can you take a minute to post your own answer for those? – Thokchom May 17 '13 at 19:20
  • @DanielFischer To put it clearly **1)** Since `%s` expects a `char*` argument,does it mean `(char[]){'H','i','\0'}` translates to type `char*` eventually? **2)** Is `(char[]){'H','i','\0'}` **exactly** the same as `"Hi"`,in all aspect?Can we use it whenever we want to use the string `"Hi"` like as arguments to library functions like strlen() or while assigning to pointers?Is it guaranteed to be of type `char*` due to translation/decomposition from type `char[]` to `char*`? – Thokchom May 17 '13 at 19:24
2

In each case, the compiler creates an initialized object of type char[3]. In the first case, it treats the object as an array, so it passes a pointer to its first element to the function. In the second case, it treats the object as a pointer, so it passes the value of the object. printf is expecting a pointer, and the value of the object is invalid when treated as a pointer, so the program crashes at runtime.

William Pursell
  • 204,365
  • 48
  • 270
  • 300
  • 2
    What does "it treats the object as a pointer" mean? Is it the exact byte contents of the array as Lee Meador suspects? – Micha Wiedenmann May 17 '13 at 17:24
  • Pointers are passed by value. Arrays are passed by passing a pointer to the first element. The cast tells the compiler to treat the object as a pointer, so it passes it by value, because pointers are passed by value. – William Pursell May 17 '13 at 17:25
  • @WilliamPursell there is no object in C, but in C++. What exactly does the word `object` refer to? – RAM May 17 '13 at 17:36
  • 1
    @NeerajT C does have objects, but they are different than objects in C++. An "object" is a chunk of memory. From the comp.lang.c faq: "Any piece of data that can be manipulated by a C program: a simple variable, an array, a structure, a piece of malloc'ed memory, etc." – William Pursell May 17 '13 at 20:31
-1

The third version should not even compile. 'H' is not a valid initializer for a pointer type. GCC gives you a warning but not an error by default.

R.. GitHub STOP HELPING ICE
  • 208,859
  • 35
  • 376
  • 711
  • It does compile, because it is in fact a valid C program. According to the standard, the scalar initializer can have excess elements, which should be ignored. – Richard J. Ross III May 17 '13 at 17:40
  • 1
    `'H'` is still not a valid initializer for `char*`. There is no implicit conversion from `int` (the type of `'H'`) to `char*`, other than the special case of a null pointer constant. – Keith Thompson May 17 '13 at 17:43
  • No, @RichardJ.RossIII, "The initializer for a scalar shall be a single expression, optionally enclosed in braces." Excess initialisers invoke undefined behaviour. The compiler need not accept it. – Daniel Fischer May 17 '13 at 17:48
  • Indeed, the excess initializers are also a problem, but not the one I mentioned. – R.. GitHub STOP HELPING ICE May 17 '13 at 18:17