Let's look at your examples and make sure you know why what is happening is happening. But first, a quick review of pointers to make sure we are on the same page:
A Pointer & Pointer Arithmetic
A pointer is simply a normal variable that holds the address of something else as its value. In other words, a pointer points to the address where something else can be found. Where you normally think of a variable holding an immediate values, such as int a = 5;
, a pointer would simply hold the address where 5
is stored in memory, e.g. int *b = &a;
. It works the same way regardless what type of object the pointer points to. It is able to work that way because the type
of the pointer controls the pointer arithmetic, e.g. with a char *
pointer, pointer+1
point to the next byte, for an int *
pointer (normal 4-byte integer), pointer+1
will point to an offset 4-bytes after pointer
. (so a pointer, is just a pointer.... where arithmetic is automatically handled by the type
)
What am I doing in Example A?
Your initialization is key to why Example A works and why Example B crashes. Example A uses a compound literal to initialize s1
so s1
points to the first character 'a'
in "abcd"
in modifiable memory. The compound-literal was introduced in C99, but gcc provides the compound-literal as an extension to C89 as well. In Example A you use:
s1 = (char[]){'a','b','c','d','\0'};
which is equivalent to
s1 = (char[]){ "abcd" };
The compound literal is (type){ ..initializer.. }
, the key part being the (type)
which works as a cast of the initializer value to that type. In your example A "abcd"
is cast to char[]
(a character array) which you can freely modify.
Why does Example B Crash?
On the other hand:
s1 = "abcd";
initializes s1
to a string-literal. A string-literal is created in read-only memory by most Operating Systems (generally in the .rodata
section of the executable). See: Why are C string literals read-only? for a historical view. You cannot modify values in read-only memory and attempting to do so generally results in a SEGFAULT
(as you have probably found).
You were right in your comment on Example D!
char s1[4];
Creates a character array with space for 4-characters (ASCII). When you call strcpy (s1, "abcd");
you are attempting to copy 1-more character than will fit:
'a','b','c','d','\0'
1 2 3 4 5
This results in Undefined Behavior and can result in exploitable buffer-overflow. From man 3 strcpy
,
If the destination string of a strcpy() is not large enough, then
anything
might happen. Overflowing fixed-length string buffers is a favorite cracker
technique for taking complete control of the machine. Any time a program
reads or copies data into a buffer, the program first needs to check that
there's enough space. This may be unnecessary if you can show that overflow
is impossible, but be careful: programs can get changed over time, in ways
that may make the impossible possible.
So just as you allocated (4+1)
chars/bytes in Example C, you need at least (4+1)
chars/bytes of storage in s1
in Example D.
Remember, every C-library str...
function requires a nul-terminated string. When you create a character-array, it is your responsibility to insure that it is nul-terminated to make it a string in C. If it's not nul-terminated, then it is simply an array of characters -- and any time you fail to pass a nul-terminated string to a function expecting one, the function will not know when to stop reading and will happily stray off reading out-of-bounds until it happens to stumble upon a zero-byte, or SEGFAULTS, whichever occurs first.
Look things over and digest them and let me know if you have further questions. (and add a '\n'
to your printf
format string (e.g. "%s\n"
) so that a newline is output -- at the very least on your last call to make your program POSIX compliant)