3

I was reading a book on C today, and it mentioned that the following was true; I was so curious as to why that I made this program to verify; and then ultimately post it here so someone smarter than me can teach me why these two cases are different at runtime.

The specifics of the question related to the difference at runtime between how a (char *) is handled based on whether it is pointing to a string created as a literal vs. created with malloc and manual population.

why is the memory allocated by the memory more protected like this? Also, does the answer explain the meaning of "bus error"?

Here is a program I wrote which asks the user if they would like to crash or not, to illustrate that the program compiles fine; and to highlight that in my head the code in both options is conceptually identical; but that's why I'm here, to understand why they are not.

// demonstrate the difference between initializing a (char *) 
// with a literal, vs malloc
// and the mutability of the contents thereafter
#include <stdio.h>
#include <stdlib.h>
#include <stdbool.h>
int main() {
    char cause_crash;
    char *myString;

    printf("Cause crash? "); 
    scanf("%c", &cause_crash);

    if(cause_crash == 'y') {
        myString = "ab";
        printf("%s\n", myString); // ab
        *myString = 'x'; // CRASH!
        printf("%s\n", myString);   
    } else {
        myString = malloc(3 * sizeof(char));
        myString[0] = 'a';
        myString[1] = 'b';
        myString[2] = '\0';
        printf("%s\n", myString); // ab
        *myString = 'x';
        printf("%s\n", myString); // xb     
    }
    return 0;
}

edit: conclusions

There are several good answers below, but I want to summarize what I have come to understand succinctly here.

The basic answer seems to be this:

When a compiler sees a "string literal" being assigned to a (char *) variable, the pointer will point to memory which is static (perhaps actually part of the binary, but usually enforced as read only by a lower-level system than your runtime. In other words, the memory is probably not dynamically allocated at that part of the program, but instead the pointer is simply set to point to an area of static memory which houses the contents of your literal.

There are a few things I want to call out about this resolution:

1. Optimization may be a possible motive: With my compiler, two different (char *) variables initialized with the same string literal actually point to the same address:

char *myString = "hello";
char *mySecond = "hello"; // the pointers are identical! This is a cool optimization.

2 Interstingly, if the variable is actually an array of chars (instead of a (char *)), this (#1) is not true. this was interesting to me because I was under the impression that (post-compilation) arrays where identical to pointers-to-chars.

char myArString[] = "hello";
char myArSecond[] = "hello"; // the pointers are NOT the same

3 to summarize what several answers hinted at: char *myString = "Hello, World!" does not allocate new memory, it just sets myString to point to memory which already existed; perhaps in the binary, perhaps in a special read-only block of memory... etc.

4 I found through testing that char myString[] = "Hello, World!" does allocate new memory; I think... what I know is that the string is mutable when created this way.

Chris Trahey
  • 18,202
  • 1
  • 42
  • 55

5 Answers5

2

You really should have declared myString as a const char*. Literals are stored in readonly memory, they cannot be modified. Use a char[] if you need to modify it.

Ed S.
  • 122,712
  • 22
  • 185
  • 265
  • Thanks for your comment; however, I completely understand the difference. I am here to learn *why* literals are in protected memory like this, especially when they are NOT given to lvars declared with `const`. – Chris Trahey Jul 07 '12 at 23:22
  • 1
    @ctrahey: Because certain optimizations may be performed when the data is stored in the binary and cannot be modified. For example, *all* pointers which refer to `"some_read_only_string"` can refer to the same address. – Ed S. Jul 07 '12 at 23:26
  • 1
    @ctrahey: That said, it is specified in the standard, so it doesn't really matter. No optimizations must be performed; it is simply specified that string literals are stored in static memory and any attempt to modify one results in undefined behavior.. – Ed S. Jul 07 '12 at 23:30
  • 1
    The C standard says they may be assigned to read-only memory and may not be modified without invoking undefined behaviour (§6.4.2 ¶7 in the C2011 standard). – Jonathan Leffler Jul 07 '12 at 23:32
  • +1 for mention of a possible motive for immutability (sharing that constant). The comments here are really what I was after in my answer. Would you edit your answer to include them for future discoverers of this thread? – Chris Trahey Jul 07 '12 at 23:38
  • You demonstrated a great knowledge of the language and what is going on, but as an answer this is too terse to "accept". Thanks for your input. – Chris Trahey Jul 08 '12 at 01:07
2

What

myString = "ab";

does is assign the address of the constant string literal which lives in readonly memory to the char pointer myString.

If you write to this memory now, you get a crash.

OTOH, you can, of course, happily write on malloc()ed memory, so that works.

glglgl
  • 89,107
  • 13
  • 149
  • 217
1

C standards specify that literal strings are static and that attempts to modify them result in undefined behavior. In other words they should be considered read-only.

The memory that you've allocated with malloc belongs to you and you can modify it in any way you like.

The actual differences can be implementation-dependent, but typically each type of string is located in two different types/areas of memory:

  • the heap in the case of data obtained using malloc, and
  • the (read-only) data section in the case of string literals.
pb2q
  • 58,613
  • 19
  • 146
  • 147
1

When you set a variable to a string literal, you are setting it to a value stored in the read only data section of the assembly program. These data items are constant, and attempts to use them differently will most likely crash.

When you use malloc to get the memory, you are getting a pointer to read/write heap memory that you can do anything to.

This is caused by a couple of reasons. For one thing, the actual type of "Hello, world" is char[13], or constant pointer to 13 characters. You can not assign a value to a constant character. But when you do something like what you do, which is casting away the constness. That means that the compiler wont prevent you from changing the memory, but the C standard calls is undefined behavior. Undefined behavior can be anything, but it is usually a crash.

If you want to assign a literal value to char* memory, do this:

char* data = malloc (42);
memcpy(data, "Hi!", 4);
Linuxios
  • 34,849
  • 13
  • 91
  • 116
  • 1
    In C, the type of `"Hello, world"` is `char[13]`, no `const`. Historical accident, probably, there was no `const` initially. – Daniel Fischer Jul 07 '12 at 23:27
  • I suspect your mention of *the read only data section of the assembly program* is what I was looking for. It makes sense that if the compiler sees a literal, it can include that in the actual program data, and not need to dynamically allocate it's memory at runtime. – Chris Trahey Jul 07 '12 at 23:28
  • @DanielFischer: But arrays are constant pointers (the *pointer* is constant, not the data. That would be `const char[13]`). – Linuxios Jul 07 '12 at 23:29
  • 2
    The standard says nothing of a 'read only data segment' (though this is common), only that attempting to modify a literal is UB. Not all platforms even support readonly segments. – Ed S. Jul 07 '12 at 23:30
  • Huzzah, this conversation is wonderful, and just what I wanted. Thanks, y'all. – Chris Trahey Jul 07 '12 at 23:31
  • @ctrahey: Glad that this answer and conversation could be of help. – Linuxios Jul 07 '12 at 23:41
  • 1
    @Linuxios Arrays are not pointers. They are converted to (non-const) pointers to the first element in most contexts, but they are distinct types. Arrays are not assignable, in that respect they are similar to const pointers, but e.g. `sizeof` tells the difference. – Daniel Fischer Jul 07 '12 at 23:41
  • @DanielFischer; In respect to the C(++) type system, I have no doubt. But I'm talking about the internal representation, where `T arr[n]` is really a constant pointer variable of type `T`. – Linuxios Jul 07 '12 at 23:48
  • @Linuxios It is not a pointer variable. The data in the array directly live in memory, at the address mentionned by the said pointer with the appropriate size. But the pointer itself is just a value, especially not a variable. – glglgl Jul 08 '12 at 06:13
  • @glglgl: I know. I'm trying to simplify here. – Linuxios Jul 08 '12 at 13:11
0

What if you wrote this:

&mystring = &"ab";

What would that mean to you?

Would you think that you could then modify "ab" somehow? Where is &"ab"?

ANS: &"ab" is in read-only memory. When the compiler see that QUOTE it puts that string in immutable memory. Why? Probably faster somehow if the runtime doesn't have to bounds check and check for segfault,etc. on string data that really should never change.

glglgl
  • 89,107
  • 13
  • 149
  • 217
Andyz Smith
  • 698
  • 5
  • 20
  • I think I get what you're saying conceptually, however assigning to &mystring won't compile (it's not an lvalue). – Chris Trahey Jul 08 '12 at 00:31
  • Yes, the statement is used to highlite the somewhat non-sensical &"ab". It's so easy to think of mystring = "ab" as allocating some memory when are used to a garbage collected langauge. – Andyz Smith Jul 08 '12 at 00:46