0

I understand that the syntax char * = "stringLiteral"; has been deprecated and may not even work in the future. What I don't understand is WHY.

I searched the net and stack and although there are many echos confirming that char * = "stringLiteral"; is wrong and that const char * = "stringLiteral"; is corect, I have yet to find information about WHY said syntax is wrong. In other words, I'd like to know what the issue really is under the hood.

ILLUSTATING MY CONFUSION

CODE SEGMENT 1 - EVIL WAY (Deprecated)

char* szA = "stringLiteralA";     //Works fine as expected. Auto null terminated.
std::cout << szA << std::endl;    
szA = "stringLiteralB";          //Works, so change by something same length OK.
std::cout << szA << std::endl;    
szA = "stringLiteralC_blahblah"; //Works, so change by something longer OK also.
std::cout << szA << std::endl;    

Ouput:
stringLiteralA
stringLiteralB
stringLiteralC_blahblah

So what exactly is the problem here? Seems to work just fine.

CODE SEGMENT 2 (The "OK" way)

const char* szA = "stringLiteralA";  //Works fine as expected. Auto null term.
std::cout << szA << std::endl;    
szA = "stringLiteralB";          //Works, so change by something same length OK.
std::cout << szA << std::endl;    
szA = "stringLiteralC_blahblah"; //Works, so change by something longer OK also.
std::cout << szA << std::endl;    

Ouput:
stringLiteralA
stringLiteralB
stringLiteralC_blahblah

Also works fine. No difference. What is the point of adding const?

CODE SEGMENT 3

const char* const szA = "stringLiteralA";  //Works. Auto null term.
std::cout << szA << std::endl;    
szA = "stringLiteralB";           //Breaks here. Can't reasign.

I am only illustrating here that in order to read only protect the variable content you have to const char* const szA = "something"; .

I don't see the point for deprecation or any issues. Why is this syntax deprecated and considered an issue?

user1118167
  • 343
  • 1
  • 2
  • 7
  • 3
    `char * p = "test"; p[0] = 'a';` crashes. `const char * p = "test"; p[0] = 'a';` doesn't compile. – avakar Nov 07 '12 at 17:46

4 Answers4

6

const char * is a pointer (*) to a constant (const) char (pointer definitions are easily read from right to left). The point here is to protect the content, since, as the standard says, modifying the content of such a pointer results in undefined behavior.

This has its roots in the fact that typically (C/C++) compilers group the strings used throughout the program in a single memory zone, and are allowed to use the same memory locations for instances of the same string used in unrelated parts of the program (to minimize executable size/memory footprint). If it was allowed to modify string literals you could affect with one change other, unrelated instances of the same literal, which obviously isn't a great idea.

In facts, with most modern compilers (on hardware that supports memory protection) the memory area of the string table is read-only, so if you attempt to modify a string literal your program crashes. Adding const to pointers that refer to string literals makes these mistakes immediately evident as compilation errors instead of crashes.

By the way, notice that the fact that a string literal can decay implicitly to a non-const char * is just a concession to backwards compatibility with pre-standard libraries (written when const wasn't part of the C language yet), as said above the standard always said that changing string literals is UB.

Matteo Italia
  • 123,740
  • 17
  • 206
  • 299
  • You say the point here is to protect the content but code segment 2 illustrates that const char * DOES NOT protect the content. – user1118167 Nov 07 '12 at 17:54
  • @user1118167: you're incorrect, the content was preserved. All that changes is the pointer, and the pointer is not `const`, nor pretected. The. – Mooing Duck Nov 07 '12 at 17:58
  • @user1118167: code segment 2 isn't changing the content of the literal, it's changing what the pointer points to, which is a completely unrelated thing. Try `memcpy(szA, "stringLiteralB")` - which actually overwrites the content of the string pointed by `szA`, you'll see that the compilation will fail with an error. – Matteo Italia Nov 07 '12 at 17:58
2

"abc" is a static array that points to possibly immutable memory. In C, modifying the content of a string literal is undefined behavior (UB).


But C99 did not make "abc" an object of type const char [n]. In fact, this is quite the opposite, as to keep compatibility with C89 (and ANSI C), which specifies (§3.1.4/3):

A character string literal has static storage duration and type array of char, and is initialized with the given characters.

That is, the declaration

char* c = "12345";

is not deprecated in C, even up to C11.

From http://www.open-std.org/jtc1/sc22/wg14/www/C99RationaleV5.10.pdf, we can see the rationale in C99 of making the string literal modification UB, while keeping the type to be char [n]:

String literals are not required to be modifiable. This specification allows implementations to share copies of strings with identical text, to place string literals in read-only memory, and to perform ertain optimizations. However, string literals do not have the type array of const char in order to avoid the problems of pointer type checking, particularly with library functions, since assigning a pointer to const char to a plain pointer to char is not valid. Those members of the C89 Committee who insisted that string literals should be modifiable were content to have this practice designated a common extension (see §J.5.5)

where C99 §J.5.5 is:

J.5.5 Writable string literals

String literals are modifiable (in which case, identical string literals should denote distinct objects) (6.4.5).


On the other hand, as your code is C++, this should actually be wrong in standard C++, because it requires (C++03 §2.13.4/1)

... An ordinary string literal has type “array of n const char” and static storage duration ...

and assigning a const char[n] to a char* shouldn't compile. The compiler warns about "deprecation", because existing implementation at that time allowed the conversion (because C allows it), so it went into Annex D: Compatibility features:

D.4 Implicit conversion from const strings

The implicit conversion from const to non-const qualification for string literals (4.2) is deprecated.

Community
  • 1
  • 1
kennytm
  • 510,854
  • 105
  • 1,084
  • 1,005
2

The idea behind the deprecation is to help the compiler catch errors that would otherwise cause crashes at runtime.

char *hello = "hello";
strcpy(hello, "world"); // Compiles but crashes

as opposed to

const char *hello = "hello";
strcpy(hello, "world"); // Does not compile

This is a relatively cheap way of catching an entire class of very nasty runtime errors, so deprecation of the conversion is very much in line with the general philosophy of C++ as "a better C".

In addition, your code segment 2 does not invalidate the fact that the content of the pointer is protected. It is the pointer itself that gets written over, not its content. There is a difference between const char *ptr and char * const ptr: the former protects the content; the later protects the pointer itself. The two can be combined to protect the pointer and its content as const char * const ptr.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
  • Thanks. Great answer. Gave you a point. The other poster also gave an excellent answer but posted sooner so I'll give him the answer. – user1118167 Nov 07 '12 at 18:14
0

The syntax is wrong because there is not implicit conversion from char const * to char * .

The type of a string literal has been char const * for ever in C and C++. (Might be wrong about very old C.)

The change in the rules has nothing to do with the type of string literals but with allowed conversions between pointer types.

The conversion is a mistake because of a pointer-to-const-thing is that thing is immutable. A string literal, which is a value known to be constant at compile and link time, might be put in read only memory segments.

Pedro Lamarão
  • 531
  • 5
  • 22