Why uninitialised pointer variable as a string causes error?

Question

#include <stdio.h>

int main()
{     
     char *p;         
     p[0]='a';
     p[1]='b';
     p[2]='c';
     p[3]='d';
     p[4]='\0'; 
}

How is program above different from program below?

#include <stdio.h>

int main()
{
      char *p = "abcd";
}

You dereference a pointer to nowhere and you are confused what the problem is?? Always be able to identify the *valid storage* that a pointer points to (i.e. that the pointer variable holds the address to as its value) *before* you attempt to dereference the pointer. `[..]` operates as a dereference just as `'*'` does... — David C. Rankin, May 12 '20 at 20:41
first program is UB because memory for `p` is not allocated, second program allocates it — Iłya Bursov, May 12 '20 at 20:41
Q: How is program above different from program below? A: The first case allocates memory *ONLY* for the pointer (e.g. 4 bytes for a 32-bit CPU, 8 bytes for a 64-bit CPU). It doesn't allocate *ANY* memory for any of the elements, nor does it initialize the pointer to any meaningful address. — FoggyDay, May 12 '20 at 20:43
The Second Program points to the base address which is 'a' but why not the First program i have explicitly written the code p[0] points to 'a' which is as same as second program or if i am wrong correct me why does the second program works that the way it works and why not the first program — Pranav Habib, May 12 '20 at 20:50
A few links that provide basic discussions of pointers may help. [Difference between char *pp and (char*) p?](https://stackoverflow.com/a/60519053/3422102) and [Pointer to pointer of structs indexing out of bounds(?)...](https://stackoverflow.com/a/60639540/3422102) — David C. Rankin, May 12 '20 at 20:51

score 4 · Answer 1 · answered May 12 '20 at 20:48

In C, there's a distinction between the pointer itself and what's being pointed at, and the code samples you've shown above hit at that difference.

Let's begin with the second code sample:

char* p = "abcd";

Here, this does the following. First, it places, in memory, an array of five characters containing the string "abcd". That looks something like this:

+---+---+---+---+---+
| a | b | c | d | \0|
+---+---+---+---+---+

Next, it creates the pointer p, and tells p to point at the first character in that array:

+-----+
|     | p
+-----+
   |
   v
+---+---+---+---+---+
| a | b | c | d | \0|
+---+---+---+---+---+

At this point, all is right and well in the universe.

Now, contrast that with the other piece of code you wrote. The first line, char* p, creates a pointer p, but doesn't tell it where to point. As a result, it's pointing somewhere random, and likely to memory you don't own:

  +-----+
p |     | -----> ???
  +-----+

Now, if you write p[0] = a, you're saying "please go where p is pointing and write an a." Unfortunately, p isn't pointing to anything you own, and so this is the equivalent of saying something like "I know I haven't actually aimed this firearm in any sensible direction, but let's go pull the trigger and see what happens anyway." You're liable to hurt someone, possibly yourself!

To fix this, you'll need to both create the pointer p and tell it where you want it to point. You could use malloc to get yourself a block of memory, or strdup to get a copy of a string to point at, or use a string literal if you'd like.

Going forward, drawing pictures like these can be really helpful in understanding what the code is doing.

score 2 · Answer 2 · answered May 12 '20 at 20:47

The below code is correct but above is incorrect. The reason is when you define pointer you should indicate memory position. For example, char *p; // this p pointer does not indicate any memory position. So when you are trying with p[0]='X', compiler doesn't know where to save it. maybe it will raise core dump error. In this case you should do like this. char *p = (char *)malloc(255 * sizeof(char)); //allocate memory memset(p, 0, 255); // init memory with zero data Then your first code will work. (Do not forget to include stdlib.h)

Hope this helps you.

I forgot to explain about second code snippet. char *p = "abcd"; // it indicates "abcd" string in memory. it does not allocate memory. so it works — Bojan, May 12 '20 at 20:48

score 1 · Answer 3 · answered May 12 '20 at 21:02

Difference is that in the first example you are trying to use uninitialized pointer(to nowhere) for changing the values where it points with offsets(nowhere), that's why your program crashes.

Second example, your string "abcd" will be stored as literal in special data segment of your program while compilation, and marked as read-only(constant). Ofcourse this literal has address and your pointer p will get this address while execution. more details

fnisi · Accepted Answer · 2020-05-14T21:33:19.617

In C, char *p; is a declaration for a pointer to a char and the compiler does not make memory allocation when you do not initialise the variable.

The code below just declares a pointer, it is no different than int *i;. You are telling your compiler that p points a memory address where a char is stored.

char *p; /* pointe to a char, that's it */

Whereas in the code snippet below, you initialise the memory (read-only). Similar to the example above, p is still a pointer to a char, but the memory locations following p (i.e. p+1, p+2, ...) are now allocated for chars as well. This is done by your compiler but the allocation is read-only.

char *p = "Hello"; /* pointer to a char but the memory address */
                   /* it is pointing to has been initialised   */

Looking at the code snippet you have in the OP

#include <stdio.h>

int main()
{     
     char *p;         
     p[0]='a';
     p[1]='b';
     p[2]='c';
     p[3]='d';
     p[4]='\0'; 
}

Although, the intuition may suggest otherwise, this

     char *p;         
     p[0]='a'; /* error here */

is not same as

     char *p, c='a';         
     p=&c;

The former requires you to have an initialised (allocated) memory. In C, you can only assign a constant value to a character pointer when it is declared. i.e.

     char *p="a";  /* valid operation */

If a pointer to a char is not initialised when it is declared, you cannot do it afterwards. The only valid assignment operation you can perform from that point onwards is assigning the address of a char variable to the pointer. That's it.

Let's look at how your compiler would interpret two operations and conclude from there. Take this toy program

int main() {
    char c, *p, *str = "Hello";
    c = 'a';
    p = &c;
    p[0] = 'a';
}

Without any optimizations, clang emits the code below

main:                                   # @main
        push    rbp                        ; function prologu
        mov     rbp, rsp                   ; function prologu
        xor     eax, eax
        movabs  rcx, offset .L.str         ; *str = "Hello"
        mov     qword ptr [rbp - 24], rcx
        mov     byte ptr [rbp - 1], 97     ; c = 'a'
        lea     rcx, [rbp - 1]             ; p = &c
        mov     qword ptr [rbp - 16], rcx  ; p = &c
        mov     rcx, qword ptr [rbp - 16]  ; p[0]='a'
        mov     byte ptr [rcx], 97         ; p[0]='a'
        pop     rbp
        ret
.L.str:
        .asciz  "Hello"

The LEA instruction loads the effective address of the second operand into the first operand. In the code above, the memory location rbp-1 is reserved for the variable c, hence it holds value of 97, which is the ASCII code for a. Whereas the location rbp-16 is for pointer p.

Take this code below

int main() {
    char c, *p, *str = "Hello";
    c = 'a';
    p = &c;
    p=malloc(1); /* added a malloc() here */
    p[0] = 'a';
}

and it is assembly code is below

main:                                   # @main
        push    rbp                        ; function prologue
        mov     rbp, rsp                   ; function prologu
        sub     rsp, 32                    ; function prologu
        movabs  rax, offset .L.str         ; *str = "Hello"
        mov     qword ptr [rbp - 24], rax
        mov     byte ptr [rbp - 1], 97     ; c = 'a'
        lea     rax, [rbp - 1]             ; p = &c
        mov     qword ptr [rbp - 16], rax  ; p = &c
        mov     edi, 1                     ; malloc(1)
        call    malloc                     ; malloc(1)
        xor     ecx, ecx
        mov     qword ptr [rbp - 16], rax  ; malloc(1)
        mov     rax, qword ptr [rbp - 16]  ; p[0]='a'
        mov     byte ptr [rax], 97         ; p[0]='a'
        mov     eax, ecx
        add     rsp, 32
        pop     rbp
        ret
.L.str:
        .asciz  "Hello"

When p[0]='a' is performed, the two code snippets executes the same instructions except using different registers to store the address p points to. However, the malloc() call preceding the p[0]='a' line makes a considerable difference; the memory location that p is pointing is now allocated by the operating system and assignments to that memory address are valid operations.

Without a malloc() calls, you are just trying to write into uninitialised memory, you have not requested that piece of memory from your operating system yet. Although, the instructions seems the same, the way your operating system seems them is different.

I get that char *p is not initialised to any variable that is the reason why it causes the error but later i am explicitly initialising char *p to 'a' and so on.My point is even if we later on initialise char *p to 'a' then why does it give an error — Pranav Habib, May 14 '20 at 14:04
See my edit; TL;DR - you can only initialise a char pointer when it is declared, otherwise the only valid assignment operation is assigning the address of a char variable to the pointer. — fnisi, May 14 '20 at 21:34

Why uninitialised pointer variable as a string causes error?

4 Answers4