18

The following program provokes systematic segmentation fault due to undefined behavior (trying to modify a string literal):

int main() {
  char *s = "immutable";
  s[0] = 'a';
  return 0;
}

Still, there seems to be absolutely no way to tell GCC/Clang to emit even the slightest warning about it (-Wall -Wextra -pedantic -std=c11 don't do anything).

Especially for beginners, this kind of situation would be useful to inform about. Even for non-beginners, in some slightly less obvious situations it could be helpful:

void f(char *s) {
  s[0] = '0';
}

int main() {
  char *s = "immutable";
  f("literal"); // oops
  f(s); // oops
  return 0;
}

Besides, this would help enforce some const-culture in C programming.

Why are such cases deliberately ignored? Does the standard actively prohibit diagnostics from being emitted in such cases, or is it mostly for backwards-compatibility (trying to enforce them now would generate too many warnings)?

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
anol
  • 8,264
  • 3
  • 34
  • 78
  • It's not on by default because sadly there is still heaps of legacy code written in non-const correct fashion. Some even predates the addition of `const` to C. – StoryTeller - Unslander Monica Sep 06 '17 at 07:23
  • 1
    Enabling such a warning by default would lead to alarm fatigue. It would be nice to have compilers warn for this but there is just too much old code that would trip over this while still being correct code. – Art Sep 06 '17 at 07:30

2 Answers2

16

TL;DR C compilers do not warn, because they do not "see" a problem there. By definition, C string literals are null terminated char arrays. It's only stated that,

[...] If the program attempts to modify such an array, the behavior is undefined.

So, in the compilation process, it is not known to the compiler that a char array should behave as a string literal or string. Only the attempt to modification is prohibited.

Related read: For anybody interested, see Why are C string literals read-only?

That said, I am not very sure whether this is a good option, but gcc has -Wwrite-strings option.

Quoting the online manual,

-Wwrite-strings

When compiling C, give string constants the type const char[length] so that copying the address of one into a non-const char * pointer produces a warning. These warnings help you find at compile time code that can try to write into a string constant, but only if you have been very careful about using const in declarations and prototypes. Otherwise, it is just a nuisance. This is why we did not make -Wall request these warnings.

So, it produces a warning using the backdoor way.

By definition, C string literals (i.e., character string literals) are char arrays with null terminator. The standard does not mandate them to be const qualified.

Ref: C11, chapter

In translation phase 7, a byte or code of value zero is appended to each multibyte character sequence that results from a string literal or literals. The multibyte character sequence is then used to initialize an array of static storage duration and length just sufficient to contain the sequence. For character string literals, the array elements have type char, and are initialized with the individual bytes of the multibyte character sequence. [....]

Using the aforesaid option makes the string literals const qualified so using a string literal as the RHS of assignment to a non-const type pointer triggers a warning.

This is done with reference to C11, chapter §6.7.3

If an attempt is made to modify an object defined with a const-qualified type through use of an lvalue with non-const-qualified type, the behavior is undefined. [...]

So, here the compiler produces a warning for the assignment of const qualified type to a non-const-qualified type.

Related to why using -Wall -Wextra -pedantic -std=c11 does not produce this warning, is, quoting the quote once again

[...] These warnings help you find at compile time code that can try to write into a string constant, but only if you have been very careful about using const in declarations and prototypes. Otherwise, it is just a nuisance. This is why we did not make -Wall request these warnings.

Sourav Ghosh
  • 133,132
  • 16
  • 183
  • 261
  • *"Otherwise, it is just a nuisance."* It sounds like whoever wrote the manual is trying to make excuses for his own code :P. – user694733 Sep 06 '17 at 07:44
  • umm...maybe not? const-to non-const is neither allowed nor implicit and warrants a diagnostic. making one side forcefully const qualified may break some valid code, you know? – Sourav Ghosh Sep 06 '17 at 07:48
  • 1
    Yes, you cannot use this option with legacy code. But any new code should strive to be const-correct. I just found it funny that manual sounds overly defensive. – user694733 Sep 06 '17 at 07:54
  • That sentence was written more than 20 years ago. Back then, we were a lot less sanguine about the possibility of getting C programmers to tignten up their sloppy code. If I were revising this manual now I would say something more positive, perhaps "If you are writing a new program from scratch, we recommend you use this option. However, enabling it for existing programs will be a great deal of work and is not likely to find very many bugs." – zwol Sep 11 '17 at 18:07
14

There is an option for this: -Wwrite-strings. It works by changing the type of string literals from char[N] to const char[N]. This change is not compatible with standard C and will cause valid code to be rejected, and in rare cases invalid code to be accepted silently. It's not enabled by default.

Unfortunately, due to the way string literals are defined in C, providing good warnings for this without changing the language is very difficult.

  • *"and in rare cases invalid code to be accepted silently."* How would that happen? If `-Wwrite-strings` makes error checking more strict, this should not be possible. – user694733 Sep 06 '17 at 07:41
  • 4
    @user694733 Code which assigns the address of a string literal to a `const char (*)[]` variable will be accepted silently (unless it's been changed since I last checked), but in standard C, there is no implicit conversion from `char (*)[]` to `const char (*)[]`, so this requires a diagnostic. –  Sep 06 '17 at 07:45
  • Not sure why anyone would take address of string literal, but I guess you are correct. I still think that warning about normal case `char * a = "x";`, outweights that possible problem. – user694733 Sep 06 '17 at 07:57