In C, char *p;
is a declaration for a pointer to a char and the compiler does not make memory allocation when you do not initialise the variable.
The code below just declares a pointer, it is no different than int *i;
. You are telling your compiler that p
points a memory address where a char
is stored.
char *p; /* pointe to a char, that's it */
Whereas in the code snippet below, you initialise the memory (read-only). Similar to the example above, p
is still a pointer to a char
, but the memory locations following p
(i.e. p+1
, p+2
, ...) are now allocated for chars as well. This is done by your compiler but the allocation is read-only.
char *p = "Hello"; /* pointer to a char but the memory address */
/* it is pointing to has been initialised */
Looking at the code snippet you have in the OP
#include <stdio.h>
int main()
{
char *p;
p[0]='a';
p[1]='b';
p[2]='c';
p[3]='d';
p[4]='\0';
}
Although, the intuition may suggest otherwise, this
char *p;
p[0]='a'; /* error here */
is not same as
char *p, c='a';
p=&c;
The former requires you to have an initialised (allocated) memory. In C, you can only assign a constant value to a character pointer when it is declared. i.e.
char *p="a"; /* valid operation */
If a pointer to a char is not initialised when it is declared, you cannot do it afterwards. The only valid assignment operation you can perform from that point onwards is assigning the address of a char variable to the pointer. That's it.
Let's look at how your compiler would interpret two operations and conclude from there. Take this toy program
int main() {
char c, *p, *str = "Hello";
c = 'a';
p = &c;
p[0] = 'a';
}
Without any optimizations, clang emits the code below
main: # @main
push rbp ; function prologu
mov rbp, rsp ; function prologu
xor eax, eax
movabs rcx, offset .L.str ; *str = "Hello"
mov qword ptr [rbp - 24], rcx
mov byte ptr [rbp - 1], 97 ; c = 'a'
lea rcx, [rbp - 1] ; p = &c
mov qword ptr [rbp - 16], rcx ; p = &c
mov rcx, qword ptr [rbp - 16] ; p[0]='a'
mov byte ptr [rcx], 97 ; p[0]='a'
pop rbp
ret
.L.str:
.asciz "Hello"
The LEA
instruction loads the effective address of the second operand into the first operand. In the code above, the memory location rbp-1
is reserved for the variable c
, hence it holds value of 97, which is the ASCII code for a
. Whereas the location rbp-16
is for pointer p
.
Take this code below
int main() {
char c, *p, *str = "Hello";
c = 'a';
p = &c;
p=malloc(1); /* added a malloc() here */
p[0] = 'a';
}
and it is assembly code is below
main: # @main
push rbp ; function prologue
mov rbp, rsp ; function prologu
sub rsp, 32 ; function prologu
movabs rax, offset .L.str ; *str = "Hello"
mov qword ptr [rbp - 24], rax
mov byte ptr [rbp - 1], 97 ; c = 'a'
lea rax, [rbp - 1] ; p = &c
mov qword ptr [rbp - 16], rax ; p = &c
mov edi, 1 ; malloc(1)
call malloc ; malloc(1)
xor ecx, ecx
mov qword ptr [rbp - 16], rax ; malloc(1)
mov rax, qword ptr [rbp - 16] ; p[0]='a'
mov byte ptr [rax], 97 ; p[0]='a'
mov eax, ecx
add rsp, 32
pop rbp
ret
.L.str:
.asciz "Hello"
When p[0]='a'
is performed, the two code snippets executes the same instructions except using different registers to store the address p
points to. However, the malloc()
call preceding the p[0]='a'
line makes a considerable difference; the memory location that p
is pointing is now allocated by the operating system and assignments to that memory address are valid operations.
Without a malloc()
calls, you are just trying to write into uninitialised memory, you have not requested that piece of memory from your operating system yet. Although, the instructions seems the same, the way your operating system seems them is different.