37

Still learning more C and am a little confused. In my references I find cautions about assigning a pointer that has not been initialized. They go on to give examples. Great answers yesterday by the way from folks helping me with pointers, here:

Precedence, Parentheses, Pointers with iterative array functions

On follow up I briefly asked about the last iteration of the loop and potentially pointing the pointer to a non-existent place (i.e. because of my references cautioning against it). So I went back and looked more and find this:

If you have a pointer

int *pt;

then use it without initializing it (i.e. I take this to mean without a statement like *pt= &myVariable):

*pt = 606;

you could end up with a real bad day depending on where in memory this pointer has been assigned to. The part I'm having trouble with is when working with a string of characters something like this would be ok:

char *str = "Sometimes I feel like I'm going crazy.";

Where the reference says, "Don't worry about where in the memory the string is allocated; it's handled automatically by the compiler". So no need to say initialize *str = &str[0]; or *str = str;. Meaning, the compiler is automatically char str[n]; in the background?

Why is it that this is handled differently? Or, am I completely misunderstanding?

dbush
  • 205,898
  • 23
  • 218
  • 273
Dan
  • 758
  • 6
  • 20
  • 2
    A string in C is a [null-terminated] array of characters. When a literal array is used in an assignment expression context it decays to a pointer to its first element. So, `"Sometimes..."` is equivalent to `&"Sometimes..."[0]`. – DYZ Jan 03 '19 at 16:22
  • Relevant earlier Q&A: https://stackoverflow.com/questions/27484168/how-to-explain-c-pointers-declaration-vs-unary-operators-to-a-beginner – Ilmari Karonen Jan 04 '19 at 00:43
  • See the dups: 1. [In C, why can't an integer value be assigned to an int* the same way a string value can be assigned to a char*?](https://stackoverflow.com/q/31548263/4389800) 2. [Why it is possible to assign string to character pointer in C but not an integer value to an integer pointer](https://stackoverflow.com/q/46748996/4389800) 3. [Assigning strings to pointer in C Language](https://stackoverflow.com/q/24690475/4389800) 4. [Why must int pointer be tied to variable but not char pointer?](https://stackoverflow.com/q/8371968/4389800) – P.P Feb 16 '19 at 10:04

9 Answers9

22

In this case:

char *str = "Sometimes I feel like I'm going crazy.";

You're initializing str to contain the address of the given string literal. You're not actually dereferencing anything at this point.

This is also fine:

char *str;
str = "Sometimes I feel like I'm going crazy.";

Because you're assigning to str and not actually dereferencing it.

This is a problem:

int *pt;
*pt = 606;

Because pt is not initialized and then it is dereferenced.

You also can't do this for the same reason (plus the types don't match):

*pt= &myVariable;

But you can do this:

pt= &myVariable;

After which you can freely use *pt.

dbush
  • 205,898
  • 23
  • 218
  • 273
  • 3
    I take what you've written to mean: `char *str = "Sometimes I feel like I'm going crazy.";` **does not =** `char *str;` `*str = "Sometimes I feel like I'm going crazy.";` – Dan Jan 03 '19 at 16:23
  • 1
    @Dan Correct. In the former case you initialize the pointer, in the latter case you dereference it. – dbush Jan 03 '19 at 16:28
  • Starting to get it a little better! Appreciate all the help! – Dan Jan 03 '19 at 20:03
16

When you write sometype *p = something;, it's equivalent to sometype *p; p = something;, not sometype *p; *p = something;. That means when you use a string literal like that, the compiler figures out where to put it and then puts its address there.

The statement

char *str = "Sometimes I feel like I'm going crazy.";

is equivalent to

char *str;
str = "Sometimes I feel like I'm going crazy.";
Richard Chambers
  • 16,643
  • 4
  • 81
  • 106
13

Simplifying the string literal can be expressed as:

const char literal[] = "Sometimes I feel like I'm going crazy.";

so the expression

char *str = "Sometimes I feel like I'm going crazy.";

is logically equivalent to:

const char literal[] = "Sometimes I feel like I'm going crazy.";
const char *str = literal;

Of course literals do not have the names.

But you can't dereference the char pointer which does not have allocated memory for the actual object.

/* Wrong */
char *c;
*c = 'a';
/* Wrong  - you assign the pointer with the integer value */ 
char *d = 'a';

/* Correct  */
char *d = malloc(1);
*d = 'a';

/* Correct */
char x
char *e = &x;
*e = 'b';

The last example:

/* Wrong - you assign the pointer with the integer value */
int *p = 666;

/* Wrong you dereference the pointer which references to the not allocated space */
int *r;
*r = 666;

/* Correct */
int *s = malloc(sizeof(*s));
*s = 666;

/* Correct */
int t;
int *u = &t;
*u = 666;

And the last one - something similar to the string literals = the compound literals:

/* Correct */
int *z = (int[]){666,567,234};
z[2] = 0;
*z = 5;

/* Correct */
int *z = (const int[]){666,567,234}; 
chqrlie
  • 131,814
  • 10
  • 121
  • 189
0___________
  • 60,014
  • 4
  • 34
  • 74
  • Thank you @P_J_ this helps me also! I appreciate it! – Dan Jan 03 '19 at 18:33
  • This actually helped me the most. I wasn't seeing the logical equivalency. Is it standard practice to leave out the explicit declaration of `const char literal[] = "Sometimes I feel like I'm going crazy."` ??? Or would it be better to use something like that? To me it is completely explicit and less confusing. – Dan Jan 07 '19 at 15:24
  • I would expect c to be less contractual language. i.e `/* Correct */ char *d = malloc(1); *d = 'a';` ? implicit assignment of `*d = &'a'` – Siyon DP Feb 14 '19 at 09:28
  • 2
    Saying that `char *str = literal` is "logically equivalent" is a mistake... the string literal (`"foo"`) is often placed in read-only memory. i.e. `char * s = "foo"` **can't** be edited. On the other hand, `char s[] = "foo"` initializes the string data **on the stack**, making it editable. Two very different things. – Myst Feb 17 '19 at 15:46
  • @Myst your comment is a mistake. Read again what I wrote and show me the the place in that part where `char [] ` data is. There is something before char `char` There was missing const before the pointer. – 0___________ Feb 17 '19 at 17:22
  • @P__J__ - you're right, your wording, taken as is, is perfectly fine. My reaction was due to the fact that the **question** isn't about `char str[] = "foo"`... rather, it's about `char *str = "foo"` **and the memory**. These two are distinctly different, especially as they relate to memory allocation / placement. Your answer seems to ignore this difference. – Myst Feb 17 '19 at 22:21
5

Good job on coming up with that example. It does a good job of showing the difference between declaring a pointer (like char *text;) and assigning to a pointer (like text = "Hello, World!";).

When you write:

char *text = "Hello!";

it is essentially the same as saying:

char *text;        /* Note the '*' before text */
text = "Hello!";   /* Note that there's no '*' on this line */

(Just so you know, the first line can also be written as char* text;.)

So why is there no * on the second line? Because text is of type char*, and "Hello!" is also of type char*. There is no disagreement here.

Also, the following three lines are identical, as far as the compiler is concerned:

char *text = "Hello!";
char* text = "Hello!";
char * text = "Hello!";

The placement of the space before or after the * makes no difference. The second line is arguably easier to read, as it drives the point home that text is a char*. (But be careful! This style can burn you if you declare more than one variable on a line!)

As for:

int *pt;
*pt = 606;   /* Unsafe! */

you might say that *pt is an int, and so is 606, but it's more accurate to say that pt (without a *) is a pointer to memory that should contain an int. Whereas *pt (with a *) refers to the int inside the memory that pt (without the *) is pointing to.

And since pt was never initialized, using *pt (either to assign to or to de-reference) is unsafe.

Now, the interesting part about the lines:

int *pt;
*pt = 606;   /* Unsafe! */

is that they'll compile (although possibly with a warning). That's because the compiler sees *pt as an int, and 606 as an int as well, so there's no disagreement. However, as written, the pointer pt doesn't point to any valid memory, so assigning to *pt will likely cause a crash, or corrupt data, or usher about the end of the world, etc.

It's important to realize that *pt is not a variable (even though it is often used like one). *pt just refers to the value in the memory whose address is contained in pt. Therefore, whether *pt is safe to use depends on whether pt contains a valid memory address. If pt isn't set to valid memory, then the use of *pt is unsafe.

So now you might be wondering: What's the point of declaring pt as an int* instead of just an int?

It depends on the case, but in many cases, there isn't any point.

When programming in C and C++, I use the advice: If you can get away with declaring a variable without making it a pointer, then you probably shouldn't declare it as a pointer.

Very often programmers use pointers when they don't need to. At the time, they aren't thinking of any other way. In my experience, when it's brought to their attention to not use a pointer, they will often say that it's impossible not to use a pointer. And when I prove them otherwise, they will usually backtrack and say that their code (which uses pointers) is more efficient than the code that doesn't use pointers.

(That's not true for all programmers, though. Some will recognize the appeal and simplicity of replacing a pointer with a non-pointer, and gladly change their code.)

I can't speak for all cases, of course, but C compilers these days are usually smart enough to compile both pointer code and non-pointer code to be practically identical in terms of efficiency. Not only that, but depending on the case, non-pointer code is often more efficient than code that uses pointers.

J-L
  • 1,786
  • 10
  • 13
5

There are 4 concepts which you have mixed up in your example:

  1. declaring a pointer. int *p; or char *str; are declarations of the pointers
  2. initializing a pointer at declaration. char *str = "some string"; declares the pointer and initializes it.
  3. assigning a value to the pointer. str = "other string"; assigns a value to the pointer. Similarly p = (int*)606; would assign the value of 606 to the pointer. Though, in the first case the value is legal and points to the location of the string in static memory. In the second case you assign an arbitrary address to p. It might or might not be a legal address. So, p = &myint; or p = malloc(sizeof(int)); are better choices.
  4. assigning a value to what the pointer points to. *p = 606; assigns the value to the 'pointee'. Now it depends, if the value of the pointer 'p' is legal or not. If you did not initialize the pointer, it is illegal (unless you are lucky :-)).
Serge
  • 11,616
  • 3
  • 18
  • 28
3

Many good explanations over here. The OP has asked

Why is it that this is handled differently?

It is a fair question, he means why, not how.

Short answer

It is a design decision.

Long answer

When you use a literal in an asigment, the compiler has two options: either it places the literal in the generated assembly instruction (maybe allowing variable length assembly instructions to accomodate different literal byte lenghts) or it places the literal somewhere the cpu can reach it (memory, registers...). For ints, it seems a good choice to place them on the assembly instruction, but for strings... almost all strings used in programs (?) are too long to be placed on the assembly instruction. Given that arbitrarily long assembly instructions are bad for general purpose CPUs, C designers have decided to optimize this use case for strings and save the programmer one step by allocating memory for him. This way, the behaviour is consistent across machines.

Counterexample Just to see that, for other languages, this has not to be necessarily the case, check this. There (it is Python), int constants are actually placed in memory and given an id, always. So, if you try to get the address of two different variables that were asigned the same literal, it will return the same id (since they are refereing to the same literal, already placed in memory by the Python loader). It is useful to stress that in Python, the id is equivalent to an address in the Python's abstract machine.

Community
  • 1
  • 1
Fusho
  • 1,469
  • 1
  • 10
  • 22
  • 1
    All of these answers have helped me understand this so much more. I appreciate all of the help everyone has given me in understanding this. – Dan Feb 21 '19 at 12:16
2

Each byte of memory is stored in its own numbered pigeon-hole. That number is the "address" of that byte.

When your program compiles, it builds up a data-table of constants. At run-time these are copied into memory somewhere. So upon execution, in memory is the string (here at the 100,000th byte):

@100000 Sometimes I feel like I'm going crazy.\0

The compiler has generated code, such that when the variable str is created, it is automatically initialised with the address of where that string came to be stored. So in this example's case, str -> 100000. This is where the name pointer comes from, str does not actually contain that string-data, it holds the address of it (i.e. a number), "pointing" to it, saying "that piece of data at this address".

So if str was treated like an integer, it would contain the value 100000.

When you dereference a pointer, like *str = '\0', it's saying: The memory str points at, put this '\0' there.

So when the code defines a pointer, but without any initialisation, it could be pointing anywhere, perhaps even to memory the executable doesn't own (or owns, but can't write to).

For example:

int *pt = blah;  // What does 'pt' point at?

It does not have an address. So if the code tries to dereference it, it's just pointing off anywhere in memory, and this gives indeterminate results.

But the case of:

int number = 605;
int *pt    = &number

*pt = 606;

Is perfectly valid, because the compiler has generated some space for the storage of number, and now pt contains the address of that space.

So when we use the address-of operator & on a variable, it gives us the number in memory where the variable's content is stored. So if the variable number happened to be stored at byte 100040:

int number = 605;
printf( "Number is stored at %p\n", &number );

We would get the output:

Number is stored at 100040

Similarly with string-arrays, these are really just pointers too. The address is the memory-number of the first element.

// words, words_ptr1, words_ptr2 all end up being the same address
char words[] = "Sometimes I feel like I'm going crazy."
char *words_ptr1 = &(words[0]);
char *words_ptr2 = words;
Kingsley
  • 14,398
  • 5
  • 31
  • 53
0

There are answers here with very good and detailed information. I will post another answer, perhaps targeting more straightly to the OP. Rephrasing it a bit:

Why is

int *pt;
*pt = 606;

not ok (non working case), and

char *str = "Sometimes I feel like I'm going crazy.";

is ok (working case)?

Consider that:

  1. char *str = "Sometimes I feel like I'm going crazy.";
    

    is equivalent to

    char *str;
    str = "Sometimes I feel like I'm going crazy.";
    
  2. The closest "analogous", working case for int is (using a compound literal instead of a string literal)

    int *pt = (int[]){ 686, 687 };
    

    or

    int *pt;
    pt = (int[]){ 686, 687 };
    

So, the differences with your non-working case are three-fold:

  1. Use pt = ... instead of *pt = ...

  2. Use a compound literal, not a value (by the same token, str = 'a' wouldn't work).

  3. Compound literals are not always guaranteed to work, since the lifetime of its storage depends on standard/implementation. In fact, its use as above may give the compilation error taking address of temporary array.

-1

A string variable can be declared either as an array of characters char txt[] or using a character pointer char* txt. The following illustrates the declaration and initialization of a string:

char* txt = "Hello";

C string literal

In fact, as illustrated above, txt is a pointer to the first character of the string literal.

Whether we are able to modify (read/write) a string variable or not, depends on how we declared it.

6.4.5 String literals (ISO)
6. It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.

Actually, if we declare a string txt like we previously did, the compiler will declare the string literal in a read-only data section .rodata (platform dependent) even if txt is not declared as const char*. So we can not modify it. Actually, we should not even try to modify it. In this case gcc can fire warnings (-Wwrite-strings) or even fail due to -Werror. In this cas, it is better to declare string variable as const pointers:

const char* txt = "Hello";

On the other hand, we can declare a string variable as an array of characters:

char txt[] = "Hello";

In that case, the compiler will arrange for the array to get initialized from the string literal, so you can modify it.

Note: An array of characters can be used as if it was a pointer to its first character. That's why we can use txt[0] or *txt syntax to access the first character. And we can even explicitly convert an array of characters to a pointer:

char txt[] = "Hello";
char* ptxt = (char*) txt;
aminosbh
  • 193
  • 1
  • 5
  • txt is not the a pointer and your graph is 100% wrong. char txt[] is not the pointer!! – 0___________ Feb 20 '19 at 22:57
  • This is just a model to facilitate the understanding of strings declaration in C. This will not be the real output of the compiler. Could you please explain to me your point of view about my graph ? (I edit my answer to be more explicit) – aminosbh Feb 21 '19 at 10:14
  • The leading byte containing `0x11` in your diagram is simply not the way any C compiler represents a string, for multiple reasons (but one is that a pointer won't fit into a single byte; another is that `txt` is the pointer and it points to the `H` in `"Hello"`). – Jonathan Leffler Feb 21 '19 at 18:01
  • My schema is just a simplified (really simplified) representation to show how a pointer to character is pointing to an array of characters, but for sure it is not the real output of the compiler: a pointer don't fit into a single byte and string literals are not allocated in the data segment `.data` but in in the read only data segment`.rodata` so will not be next the pointer address in memory. – aminosbh Feb 21 '19 at 18:18