0
char *s = "hello world". 
s[0] = 'H';

Above is undefined behaviour in C/C++.

But it it seems pretty defined to me. You try to change a spot in read only memory, Operating system says no and that's the end of it.

But in reality, sometimes the change goes through and your string actually changes. How does that work and why is this undefined?

Dan
  • 577
  • 1
  • 3
  • 9
  • 1
    This memory is not necessarily read-only from OS perspective. But C standard says - do not modify it. – Eugene Sh. Nov 27 '20 at 20:39
  • so `rodata` section of emery isn't actually read only? I thought the OS protects that region. – Dan Nov 27 '20 at 20:40
  • 9
    Sometimes there is no OS, and there is no MMU which is able to protect the memory. – Eugene Sh. Nov 27 '20 at 20:40
  • 3
    That doesn't compile in modern c++. Behavior previously varied. Nothing in C++ requires an OS to have write protected memory. – doug Nov 27 '20 at 20:43
  • Even assuming you do have an OS and protected memory, if it was going to be defined behavior, the standard would have to actually *define* what happens when you do it. "The operating system says no" is completely vague and not a sufficient definition. Does the program terminate? Does it raise a signal? If so, which one? What happens to other threads? And so on. The C/C++ standard authors are not in the business of designing OSes so they don't want to dictate these decisions. "Undefined behavior" leaves it up to the OS to decide what's best to do. – Nate Eldredge Nov 27 '20 at 20:51
  • Related: [Why is conversion from string constant to 'char*' valid in C but invalid in C++](https://stackoverflow.com/questions/20944784/why-is-conversion-from-string-constant-to-char-valid-in-c-but-invalid-in-c). This should also serve as a reminder that C and C++ are different languages. They share a common root in old C, but even modern C is different from the C of the 1980s. – user4581301 Nov 27 '20 at 21:22

1 Answers1

2

Above is undefined behaviour in C/C++.

Actually, the program has merely UB in C++ prior to C++11. Since C++11, the program is also ill-formed (i.e. compilers are allowed to refuse to compile it, and are required to diagnose the issue).

But it it seems pretty defined to me. You try to change a spot in read only memory, Operating system says no and that's the end of it.

Except that operating system says nothing when there is no operating system. The standards committees cannot simply decide that because one OS has certain behaviour that all language implementations - including those that have no concept of read only memory or even have an OS - should have that same behaviour.

But in reality, sometimes the change goes through and your string actually changes. How does that work

Simply, that language implementation didn't store the string literal in read only memory. It's not more complicated than that.

why is this undefined?

Because the language standards say so. They say so because the committees decided so. In case you are interested in their rationale, you're in luck since in this case it has been documented:

String literals are specified to be unmodifiable. This specification allows implementations to share copies of strings with identical text, to place string literals in read-only memory, and perform certain optimizations. However, string literals do not have the type array of const char, in order to avoid the problems of pointer type checking, particularly with library functions, since assigning a pointer to const char to a plain pointer to char is not valid. Those members of the Committee who insisted that string literals should be modifiable were content to have this practice designated a common extension (see F.5.5).

eerorika
  • 232,697
  • 12
  • 197
  • 326