4

Consider this definition:

char *pmessage = "now is the time";

As I see it, pmessage will point to a contiguous area in the memory containing these characters and a '\0' at the end. So I derive from this that I can use pointer arithmetic to access an individual character in this string as long as I'm in the limits of this area.

So why they say (K&R) that modifying an individual character is undefined?
Moreover, why when I run the following code, I get a "Segmentation Fault"?

*(pmessage + 1) = 'K';
  • So why is it possible to modify the string if we defined it this way: char amessage[] = "now is the time"; using this: amessage[1] = 'K'; ? What's the difference? –  Apr 15 '09 at 16:21
  • A SIGSEGV is one of the things that can happen when you do something that K&R says is undefined. – Ingo Apr 15 '09 at 16:33
  • @Leif: In this case, you're not modifying the string literal itself. You have defined an array of characters in the current scope, and asked the compiler to make sure that array is initialized to a certain string. You're free to modify your array, since it's not declared const. – unwind Apr 16 '09 at 09:03
  • Explanation for standard C/C++ http://docs.sun.com/source/819-3689/Ch3.Std.html#23706 – jfs Jun 21 '09 at 14:48

7 Answers7

17

String literals in C are not modifiable. A string literal is a string that is defined in the source code of your program. Compilers will frequently store string literals in a read-only portion of the compiled binary, so really your pmessage pointer is into this region that you cannot modify. Strings in buffers that exist in modifiable memory can be modified using the syntax above.

Try something like this.

const char* pmessage = "now is the time";

// Create a new buffer that is on the stack and copy the literal into it.
char buffer[64];
strcpy(buffer, pmessage);

// We can now modify this buffer
buffer[1] = 'K';

If you just want a string that you can modify, you can avoid using a string literal with the following syntax.

char pmessage[] = "now is the time";

This method directly creates the string as an array on the stack and can be modified in place.

bradtgmurray
  • 13,683
  • 10
  • 38
  • 36
  • Shouldn't that be "char * buffer = malloc(512);"? – David Thornley Apr 15 '09 at 16:13
  • Yup, I originally had char buffer[] = new char[512]; and converted it incorrectly. Thanks. – bradtgmurray Apr 15 '09 at 16:17
  • @David: yes, that should be char *buffer, or many other things than an incomplete type as shown. Better would be 'char buffer[] = "now is the time";' and no malloc() so no leak - and no string copy. – Jonathan Leffler Apr 15 '09 at 16:19
  • Technically, with the array approach, wouldn't you be copying the literal onto the stack at some point, hence a string copy? I'll modify the example the get rid of the dynamic memory use. – bradtgmurray Apr 15 '09 at 16:31
  • The compiler would initialize the variable - yes, so there would be a string copy somewhere. But it would not overflow bounds, etc. It's now a good answer - well done. – Jonathan Leffler Apr 15 '09 at 16:40
  • I like this so much ... Q: Why can't I do X A: You can do Y, where Y has nothing at all to do with X. Though, the first paragraph of your answer was ok. – Ingo Apr 15 '09 at 16:42
  • I agree with Ingo. I was was actually considering downvoting it (not because the second part is so bad per se, but because 12 is a too high rank for it), but the first part has some goodness to it which wins, so I will just leave it as is. – hlovdal Apr 15 '09 at 20:49
9

The string is a constant and cannot be modified. If you want to modify it, you can do:

char pmessage[] = "now is the time";

This initializes an array of characters (including the \0) instead of creating a pointer to a string constant.

Dana
  • 32,083
  • 17
  • 62
  • 73
  • This will have the same problem as the original example in the question. – Scott Langham Apr 15 '09 at 16:11
  • 1
    Without actually looking it up, I don't think so. It will be the equivalent of having a char pmessage[16] initialized with "now is the time" (the individual chars followed by the \0). I think it works. – David Thornley Apr 15 '09 at 16:15
  • @David That's how I understood it to work, too. And I don't have access to a C compiler to try it :P I'll let my ignorance stand for now, if ignorance it is :P – Dana Apr 15 '09 at 16:17
  • char pmessage[] is not same as the problem mentioned in question. pmessage here is read-write buffer. The answer is absolutely fine. – aJ. Apr 15 '09 at 16:22
1

You can use pointer arithmetic to read from a string literal, but not to write to it. The C Standard forbids modifying string literals.

  • This may be so, but it is not the compiler complaining, it's the memory protection complaining. I doubt that he would have a problem running this on some embedded non protected mode hardware. – AndreasT Apr 15 '09 at 16:12
  • So what? It would still be undefined behaviour as far as the C Standard is concerned. –  Apr 15 '09 at 16:20
  • Yes, but on the systems I learned C on (CP/M, Mac OS 6 or so) it would just work (for some values of work), and certainly wouldn't have a memory protection issue. – David Thornley Apr 15 '09 at 16:27
  • 1
    "just work" is often how undefined behaviour manifests itself :-) –  Apr 15 '09 at 16:40
  • 1
    Some compilers let you store said strings in R/W storage, although I'd never use that feature. IBM XLC lets you do this: http://publib.boulder.ibm.com/infocenter/comphelp/v8v101/index.jsp?topic=/com.ibm.xlcpp8a.doc/compiler/ref/rnpgstrg.htm – Anthony Giorgio Apr 15 '09 at 19:33
1

The "string" literal is defined in read only memory, so you shouldn't be modifying it.

sfossen
  • 4,774
  • 24
  • 18
1

The literal value of pmessage goes into code, and in most cases they are placed in code memory. Which is read only

Alphaneo
  • 12,079
  • 22
  • 71
  • 89
1

If you define a literal of the form:

char* message = "hello world";

the compiler will treat the characters as constant and may well put them in read-only memory.

So, it is advisable to use the const keyword so that any attempt to change the literal will be prevent the program from compiling:

const char* message = "hello world";

I' guessing the reason const on a literal is not enforced as part of the language is just for backwards compatibility with pre-standard versions of C where the const keyword didn't exist. Anybody know any better?

Scott Langham
  • 58,735
  • 39
  • 131
  • 204
0

When you write: char *pmessage = "now is the time";

The compiler treats it as if you wrote:

 const char internalstring[] = "now is the time";
 char *pmessage = internalstring;

The reason why you cannot modify the string, is because if you were to write:

 char *pmessage1 = "now is the time";
 char *pmessage2 = "now is the time";

The compiler will treat it as if you wrote:

 const char internalstring[] = "now is the time";
 char *pmessage1 = internalstring;
 char *pmessage2 = internalstring;

So, if you were to change one, you'd change both.

James Curran
  • 101,701
  • 37
  • 181
  • 258
  • The compiler need not do that, but it may. It doesn't have to put internalstring into read-only memory either. It doesn't matter to the Standard, because once you try to modify that you're into undefined behavior, and anything the compiler does is OK. – David Thornley Apr 15 '09 at 16:29
  • In addition, I think your example is not "const" clean. I am not sure it would even compile. Surely, the pointer should point to const char, otherwise the whole const-thing makes no sense. – Ingo Apr 15 '09 at 16:36
  • @Ingo - true, that would not, if written just like that. However, it is nevertheless what is happening internally. A literal string is a const char[], but with an implied conversion to (non-const) char*. The implied conversion is only for literal strings. – James Curran Apr 15 '09 at 19:43
  • @David: Sorry, I didn't mean to imply that the complier HAD to do that, only that it COULD do that, to explain why modifying the array was declared undefined behavior. – James Curran Apr 15 '09 at 19:48