3

I've been studying C for a few months at university now, but I missed a lecture about pointers, so I tried to make up for it by studying it online and I thought I got it - but something I just stumbled upon is extremely irritating for me.

I know that pointers hold nothing more than the address they are pointing to - for example, if I understood everything correctly so far, I have:

int *pointer;
int number = 30;
pointer = &number;
printf("Number at location: %d", *pointer);

And this works fine, as it should. I assign the adress of the variable number to pointer and then print it in the end by dereferencing pointer and getting the actual value from the adress. What irritates me though are char pointers.

I've read up on string arrays/pointers, so I tried a few things, when I noticed that something strange (for my eyes at least) happened, with int pointers too:

char* pointer;
char array[] = "Dingleberry";
pointer = array;
printf("%s\n", pointer);
return 0;

I know that I am not directly assigning the adress, but if I remember correctly, with arrays, that's not necessary in conjunction with pointers - anyway - this code here works as expected, it prints out "Dingleberry". My problem now is... why? Shouldn't the pointer, without dereferencing, only hold the address of the value? If I would dereference here, the program crashes, it does show the address if I use the & though.

I'm not getting any warnings whatsoever when compiling. Also, shouldn't it work if I were to use:

printf("%c", pointer); 

to only get one letter? (I mean, trying this does show a warning - but I'm interested in getting better and ruling out most likely stupid misconceptions on my part.)

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
Pumamori
  • 75
  • 6

4 Answers4

8

It has nothing to do with the pointer type or what it stores, it's the "%s" specifier for printf() functions that is expecting a pointer that points to a c string i.e., a nul terminated sequence of bytes.

If you want to print the pointer address, use the "%p" specifier

printf("%p\n", (void *) &pointer);

and if you want the address of the object it points to, in this case the array just

printf("%p\n", (void *) pointer);

Note: For a generic pointer use void * as it's convertible without a cast to any pointer type.

Iharob Al Asimi
  • 52,653
  • 6
  • 59
  • 97
  • Thanks for the quick answer! To follow up on this, so I don't have to make another kind of redundant post - if I was to assign the values of two char pointers of different lengths, let's say char *pointer1 = "Monday" and char *pointer2 = "Sun", with the code bit *pointer1 = *pointer2. Would that replace the entirety of "Monday" or make it "Sunday"? Thanks again! – Pumamori Jan 12 '16 at 20:52
  • Just a nitpick: `convertible without a cast to any pointer type.` right in general, but here, the cast is required as `%p` expects a `void *` and for pointers there's no default promotion. :) – Sourav Ghosh Jan 12 '16 at 20:55
  • @SouravGhosh Yes I know, I meant because the OP's pointer is `int *`. – Iharob Al Asimi Jan 12 '16 at 20:56
  • 2
    @Pumamori: neither - it will most likely result in a runtime error. First of all, `*pointer1 = *pointer2` will only attempt to overwrite the first character of `"Monday"` with the first character of `"Sun"`. Secondly, attempting to update the contents of a *string literal* results in *undefined behavior*; the code may (and most likely will) segfault, or it may work as intended, or it may do something else and leave your system in a bad state. String processing in C is not straightforward or intuitive. – John Bode Jan 12 '16 at 21:04
  • 1
    @Pumamori nope, it won't. Notice, `char *pointer1 = "Monday" `, `pointer1` points to a string literal, and by saying `*pointer1 = *pointer2`, you're trying to modify the string literal, which invokes [undefined behavior](https://en.wikipedia.org/wiki/Undefined_behavior). – Sourav Ghosh Jan 12 '16 at 21:08
  • @JohnBode Thanks for the answer! I just realised, that I was even dumber than I thought. Because I didn't "understand" the %s modifier correctly, I always interpreted `*pointer1` as the entirety of `"Monday"` in this case. This makes so much more sense now. Thank you very much.(Edit: Thanks to both of you!) – Pumamori Jan 12 '16 at 21:09
4

Hang on to your socks, this is going to get a bit bumpy.

First of all, a string in C is simply a sequence of character values followed by a zero-valued terminator. These character values may be single byte characters (represented with a char type, common encodings being ASCII and EBCDIC) or multi-byte characters (each represented by a sequence of one or more char type values, for encodings such as UTF-8). The terminator for single- and multibyte character strings is a single 0-valued byte. C also supports a "wide" character type wchar_t for encodings like (I think) UTF-16.

Strings are stored as arrays of char or wchar_t. The array must be large enough to store all the characters in the string plus the zero terminator. Thus, the string "Hello" is an array of six character values - {'H', 'e', 'l', 'l', 'o', 0}. All strings are arrays of char (or wchar_t), but not all arrays of char (or wchar_t) are strings - that zero terminator must be present for the array to represent a string.

String literals like "Hello" and "Monday" and "Sun" are stored as arrays of char such that they are visible over the entire body of the program, and their lifetime extends from program startup until the program exits. Attempting to modify the contents of a string literal invokes undefined behavior; your code may segfault, or it may do exactly what you intend, or it may do something else and leave your system in a bad state. Most common platforms store string literals in a read-only memory segment, so attempting to update them causes a segfault.

When you declare a pointer like

char *foo = "Hello";

all foo contains is the address of the first character of the string. When you pass this pointer to printf with the %s conversion specifier, printf will start at that address and "walk" down the string, printing each character until it sees the 0 terminator. Most of the library functions that deal with strings work in the same manner; they take the address of the first element of the string and "walk" down it until they see the terminator.

You can also declare an array of char and store a string to it like so:

char foo[] = "Hello";

This time, foo is a 6-element array of char that contains the string "Hello". Unlike the string literal "Hello", you can modify the contents of the foo array to your heart's content (although you will only be able to store strings of 5 characters or less to it - arrays don't automatically grow or shrink as you add or remove data).

Note that the = operator only works when initializing an array in a declaration; outside of a declaration, you can't use the = operator to copy the contents of one array to another. For example

char foo[10];
...
foo = "Hello"; // bzzzt - no good

won't work. Under most circumstances, expressions of array type (like the string literal "Hello") are implicitly converted ("decay") to pointer types, and the value of of the expression will be the address of the first element of the array. So in the line

foo = "Hello";

you're trying to assign the address of the string literal "Hello" to the array foo, which will cause the compiler to yak. Instead, you must use library functions like strcpy, strcat, sprintf, etc., to write or update arrays that store strings.

However,

char *foo;
...
foo = "Hello";

works just fine, since in this case foo is simply a pointer to char, not an array of char.

John Bode
  • 119,563
  • 19
  • 122
  • 198
  • Woo, that last vote put me over 50K! Thanks, whoever you are. – John Bode Jan 12 '16 at 22:45
  • "Note that the = operator only works when initializing an array in a declaration; " - in the declaration, `=` is not an operator. It's a symbol with the same 'spelling' as the assignment operator, but it is not the assignment operator (nor any operator at all). It's a symbol used to introduce the initializer for an object. – M.M Jan 12 '16 at 23:19
  • Thanks for taking your time to write all of this. It really helped - I think I understand it all now. I hope you have a wonderful day~! – Pumamori Jan 13 '16 at 12:24
3

You're missing the properties of the %s format specifier here.

Quoting C11 standard, chapter §7.21.6.1, fprintf()

s              If no l length modifier is present, the argument shall be a pointer to the initial element of an array of character type.280) Characters from the array are written up to (but not including) the terminating null character. [...]

So, by definition,%s expects a pointer to null-terminated array and it print out the contents of the array until the null terminator. Thus, you don't need to dereference the pointer, as you need in case of %d format specifier.

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
-1

Also, shouldn't it work if I were to use:

printf("%c", pointer); 

to only get one letter?

No, that won't work to print one letter, because of default argument promotion. Well, it might print a letter, but probably not the first one in the string your pointer presumably points to.

In short: for a variadic function - a C function that takes a variable number of arguments, such as printf() - each argument is promoted to a fixed size. So there's no way the called function can directly tell what each argument really is. That's why printf() has format specifiers in the format string - they tell the called function what the argument actually is. That's also why using an incorrect format specifier for an argument is considered undefined behavior - if you use the %s format specifier to tell the called function an int is a pointer to a string, the function will dereference the promoted value of the int and try to treat the memory it points to as a string, which it probably isn't, if it's even memory.

So, even though the result is undefined behavior, what probably will be printed is the value as a char of most likely the lowest order byte contained in the pointer itself. That might even be the letter that matches the first character in your string.

Community
  • 1
  • 1
Andrew Henle
  • 32,625
  • 3
  • 24
  • 56
  • Default argument promotion does not affect this code, pointers do not undergo any promotion. – M.M Jan 12 '16 at 23:16
  • "the function will dereference the promoted value of the int" - ints are not promoted; and the function might not "dereference the int" (it is UB) – M.M Jan 12 '16 at 23:17
  • @M.M *ints are not promoted* If `long long int` is not the same size as `int` then `int` has to be promoted. And you missed the point - it's default argument promotion that makes all arguments to variadic functions and functions with no prototype the same size. Whether any one type is promoted or not is irrelevant because the called function can't tell what each argument is without additional information. – Andrew Henle Jan 13 '16 at 12:25
  • all arguments to variadic functions are not necessarily the same size. For example `int` may be 16-bit and `long` 32-bit. The default argument promotions do not change the size of `int` or pointers. – M.M Jan 13 '16 at 21:32
  • `long long int` has nothing to do with passing an `int` to printf – M.M Jan 13 '16 at 21:34