5

Assume the following two pieces of code:

char *c = "hello world";
c[1] = 'y';

The one above doesn't work.

char c[] = "hello world";
c[1] = 'y';

This one does.

With regards to the first one, I understand that the string "hello world" might be stored in the read only memory section and hence can't be changed. The second one however creates a character array on the stack and hence can be modified.

My question is this - why don't compilers detect the first type of error? Why isn't that part of the C standard? Is there some particular reason for this?

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
Manish Burman
  • 3,069
  • 2
  • 30
  • 35
  • 1
    The decision on whether to emit a warning or error diagnostic would be compiler-dependent. As far as the standard is concerned, it's just regular old undefined behavior. – Cody Gray - on strike Jan 24 '12 at 04:01
  • `gcc` will usually warn about the first error – Joseph Quinsey Jan 24 '12 at 04:02
  • I'm referring to the standard in general. Why would you make this undefined behavior? – Manish Burman Jan 24 '12 at 04:03
  • How many times can this question be asked? – Carl Norum Jan 24 '12 at 04:04
  • @JosephQuinsey gcc doesn't warn about the first. – Manish Burman Jan 24 '12 at 04:04
  • @Mysticial its not a duplicate. My question isn't about the 'functionality' of that piece of code. Its about the standard. – Manish Burman Jan 24 '12 at 04:06
  • I just saw the new edit. Sorry, you're right. It's not a dupe. – Mysticial Jan 24 '12 at 04:07
  • Allowing undefined behaviour makes compiler implementation easier. That's about all there is to it. – Carl Norum Jan 24 '12 at 04:07
  • My mistake--gcc needs `-Wwrite-strings.` Example: `echo "int main(void) {char *c = \"hello world\"; c[1] = 'y'; return 0;}" | gcc -x c -Wwrite-strings -` gives the message `warning: initialization discards qualifiers from pointer target type.` From memory, I thought this was included in `-Wall -W.` – Joseph Quinsey Jan 24 '12 at 04:21
  • 1
    Originally, the C89 (C90) standard did not outlaw modifying literals because there was too much code written before the standard that would be broken by it. Compilers can generate warnings. GCC 4.x does not have the `-fwritable-strings` option that GCC 3.x had, but GCC 3.x would warn about at least some attempts to modify strings. Writing const-correct code is harder than writing code which pays no attention to `const`, so people are apt to take the lazy route, still. – Jonathan Leffler Jan 24 '12 at 06:20

4 Answers4

6

C compilers are not required to detect the first error, because C string literals are not const.

Referring to the N1256 draft of the C99 standard:

6.4.5 paragraph 5:

In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence; [...]

Paragraph 6:

It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.

(C11 does not change this.)

So the string literal "hello, world" is of type char[13] (not const char[13]), which is converted to char* in most contexts.

Attempting to modify a const object has undefined behavior, and most code that attempts to do so must be diagnosed by the compiler (you can get around that with a cast, for example). Attempting to modify a string literal also has undefined behavior, but not because it's const (it isn't); it's because the standard specifically says the behavior is undefined.

For example, this program is strictly conforming:

#include <stdio.h>

void print_string(char *s) {
    printf("%s\n", s);
}

int main(void) {
    print_string("Hello, world");
    return 0;
}

If string literals were const, then passing "Hello, world" to a function that takes a (non-const) char* would require a diagnostic. The program is valid, but it would exhibit undefined behavior if print_string() attempted to modify the string pointed to by s.

The reason is historical. Pre-ANSI C didn't have the const keyword, so there was no way to define a function that takes a char* and promises not to modify what it points to. Making string literals const in ANSI C (1989) would have broken existing code, and there hasn't been a good opportunity to make such a change in later editions of the standard.

gcc's -Wwrite-strings does cause it to treat string literals as const, but makes gcc a non-conforming compiler, since it fails to issue a diagnostic for this:

const char (*p)[6] = &"hello";

("hello" is of type char[6], so &"hello" is of type char (*)[6], which is incompatible with the declared type of p. With -Wwrite-strings, &"hello" is treated as being of type const char (*)[6].) Presumably this is why neither -Wall nor -Wextra includes -Wwrite-strings.

On the other hand, code that triggers a warning with -Wwrite-strings should probably be fixed anyway. It's not a bad idea to write your C code so it compiles without diagnostics both with and without -Wwrite-strings.

(Note that C++ string literals are const, because when Bjarne Stroustrup was designing C++ he wasn't as concerned about strict compatibility for old C code.)

Keith Thompson
  • 254,901
  • 44
  • 429
  • 631
  • 1
    C++ string literals are `const`, but there's also a special exception to allow initializing a `char*` variable from a string literal. – aschepler Jan 02 '13 at 14:51
3

Compilers can detect the first "error".

In modern versions of gcc, if you use -Wwrite-strings, you'll get a message saying that you can't assign from const char* to char*. This warning is on by default for C++ code.

That's where the problem is - the first assignment, not the c[1] = 'y' bit. Of course it's legal to take a char*, dereference it, and assign to the dereferenced address.

Quoting from man 1 gcc:

When compiling C, give string constants the type "const char[length]" so that
copying the address of one into a non-"const" "char *" pointer will get a warning.
These warnings will help you find at compile time code that can try to write into a
string constant, but only if you have been very careful about using "const" in
declarations and prototypes. Otherwise, it will just be a nuisance. This is why we
did not make -Wall request these warnings.

So, basically, because most programmers didn't write const-correct code in the early days of C, it's not the default behavior for gcc. But it is for g++.

Borealid
  • 95,191
  • 9
  • 106
  • 122
  • I just compiled it with gcc 4.5.2 with -Wall, and got no warnings – Michael Chinen Jan 24 '12 at 04:07
  • 1
    @fsmc: Try with -Wwrite-strings. – Borealid Jan 24 '12 at 04:09
  • Okay, with -Wwrite-strings I get 'initialization discards qualifiers from pointer target type', which, while I can understand your interpretation sounds pretty different to me from 'you can't assign from const char* to char*' – Michael Chinen Jan 24 '12 at 04:21
  • @Borealid -Wwrite-strings seems to work! And you're edit also makes sense. Thanks! – Manish Burman Jan 24 '12 at 04:21
  • 1
    @fsmc: translate from compilerese. You're initializing `c` from a pointer-with-qualifiers (`const` is a qualifier), and your type matches *except for the lack of the qualifier*. Yes, compiler warnings could be clearer, but it's more important that they're there at all! – Borealid Jan 24 '12 at 04:22
  • @fsmc: You can't (or shouldn't) because you are discarding the "const" qualifier, which is exactly what the warning is telling you. – Thanatos Jan 24 '12 at 04:23
  • @fsmc: also, keep in mind that there's a difference (blargh) between `const char*` and `char* const` and `const char* const` which is why there's the "from pointer target" bit in there. – Borealid Jan 24 '12 at 04:24
  • Giving string literals the wrong type is not "detecting this error". It's making the compiler non-conformant. There's nothing wrong with `char *s = "hello";` It's perfectly valid code. What's invalid is modifying it through that pointer. – R.. GitHub STOP HELPING ICE Jan 24 '12 at 04:26
  • @R..: the only difference between `char* s` and `const char* s` is that in the latter you aren't allowed to modify the values through the pointer. – Borealid Jan 24 '12 at 04:27
  • No, they're different types and the implicit conversion is only possible one way. Also, perhaps more importantly, their addresses have very incompatible types. You'll need `char *s` rather than `const char *s` if you want to take the address of the `s` to pass as the endptr argument for `strtol` or similar. – R.. GitHub STOP HELPING ICE Jan 24 '12 at 04:31
  • @R..: that's what `const_cast` in C++ is for. `strtol` is incorrectly typed - since it only sets `*endptr`, and always sets it to a `const char*`, the type should be `char* const*`. – Borealid Jan 24 '12 at 04:36
  • No, there's no possible correct type for it, because it can't know the type of pointer the caller is using. `char *` is preferable though because it's more common. A better design would have been to give it a `size_t` instead, but in any case, there are good reasons one might use a variable of type `char *` to store the address of something one doesn't intend to modify. – R.. GitHub STOP HELPING ICE Jan 24 '12 at 05:30
2

-Wwrite-strings seems to do what you want. Could have sworn that this was part of -Wall.

% cat chars.c 
#include <stdio.h>

int main()
{
  char *c = "hello world";
  c[1] = 'y';
  return 0;
}
% gcc -Wall -o chars chars.c          
% gcc -Wwrite-strings -o chars chars.c
chars.c: In function ‘main’:
chars.c:5: warning: initialization discards qualifiers from pointer target type

From the man pages:

When compiling C, give string constants the type "const char[length]" so that copying the address of one into a non-"const" "char *" pointer will get a warning. These warnings will help you find at compile time code that can try to write into a string constant, but only if you have been very careful about using "const" in declarations and prototypes. Otherwise, it will just be a nuisance. This is why we did not make -Wall request these warnings.

When compiling C++, warn about the deprecated conversion from string literals to "char *". This warning is enabled by default for C++ programs.

Note the "enabled by default for C++" is probably why I (and others) think -Wall covers it. Also note the explanation as to why it isn't part of -Wall.

As for relating to the standard, C99, 6.4.5 item 6 (page 63 of the linked PDF) reads:

It is unspecified whether these arrays are distinct provided their elements have the appropriate values. If the program attempts to modify such an array, the behavior is undefined.

Thanatos
  • 42,585
  • 14
  • 91
  • 146
-1

char* c = strdup("..."); would make c[1] sensible. (Removed rant on C) Though an intelligent compiler could/does warn against this, C traditionally is machine near, without (bounds/format/...) checking and other such "needless" overhead.

lint is the tool for detecting such errors: that a const char* was assigned to a char*. It would also mark a char c = c[30]; (No longer type dependent, but also addressing error.) As it would be nice to have declared c as const char*. C is an older language with a tradition of leniency and operating on many platforms.

Joop Eggen
  • 107,315
  • 7
  • 83
  • 138
  • 1
    This seems to be more of a rant than an answer and is starting to collect flags, consider revising? – Tim Post Jan 24 '12 at 07:01
  • You are right, though I have a compiler construction / programming language background, and C is imho somewhere between assembler and typed language. It is not as if I do not like C. I'll wait for another flag, and edit it first. _Really_ thanks. – Joop Eggen Jan 24 '12 at 17:00
  • If you dig into my history, you'll see that we share a similar background ;) GCC (and I suspect other compilers) _can_ warn on this. I look forward to your edits :) My comment was equally 'well, he has a bit of a point' as it was 'why didn't the moderator just delete this rant?!' :) – Tim Post Jan 24 '12 at 17:15