What makes this usage of pointers unpredictable?

Question

I'm currently learning pointers and my professor provided this piece of code as an example:

//We cannot predict the behavior of this program!

#include <iostream>
using namespace std;

int main()
{
    char * s = "My String";
    char s2[] = {'a', 'b', 'c', '\0'};

    cout << s2 << endl;

    return 0;
}

He wrote in the comments that we can't predict the behavior of the program. What exactly makes it unpredictable though? I see nothing wrong with it.

Are you sure you reproduced the professor's code correctly? While it is formally possible to argue that this program might produce "unpredictable" behavior, it makes no sense to do so. And I doubt that any professor would use something so esoteric to illustrate "unpredictable" to students. — AnT stands with Russia, Aug 04 '15 at 18:29
@LightnessRacesinOrbit insofar as `s` is not used to modify the string literal (it's not even read); if the compiler permits the deprecated conversion and the code compiles, it should have well-defined behavior, shouldn't it? — The Paramagnetic Croissant, Aug 04 '15 at 18:32
@Lightness Races in Orbit: Compilers are allowed to "accept" ill-formed code after issuing the required diagnostic messages. But language specification does not define behavior of the code. I.e. because of the error in initialization of `s`, the program, if accepted by some compiler, formally has unpredictable behavior. — AnT stands with Russia, Aug 04 '15 at 18:32
@TheParamagneticCroissant: No. The initialisation is ill-formed in modern times. — Lightness Races in Orbit, Aug 04 '15 at 18:32
Other than the deprecated assignment of a string literal to a non-const pointer, I don't see anything unpredictable about it. Are you sure you copied the text of it correctly? — Rob K, Aug 04 '15 at 18:33
@LightnessRacesinOrbit so does using a deprecated language feature really cause undefined behavior? — The Paramagnetic Croissant, Aug 04 '15 at 18:33
@TheParamagneticCroissant: No and I did not say that it does. — Lightness Races in Orbit, Aug 04 '15 at 18:34
@The Paramagnetic Croissant: Deprecated features do not cause UB. But the feature in question is no longer deprecated. It is flat out *illegal* since C++11. String literal to `char *` conversion has been removed from the language entirely. — AnT stands with Russia, Aug 04 '15 at 18:34
@AnT yeah. I reproduced the code word for word. I did get a warning: "deprecated conversion from string constant to 'char*' but I wasn't sure what that meant — trungnt, Aug 04 '15 at 18:35
@AnT then why is anything "unpredictable" about it? If we are using a (n at least hypothetically) conforming C++11 compiler, then it should always unconditionally, predictably fail to compile, shouldn't it? — The Paramagnetic Croissant, Aug 04 '15 at 18:36
@LightnessRacesinOrbit so then why is it unpredictable? (see my comment above.) — The Paramagnetic Croissant, Aug 04 '15 at 18:37
@The Paramagnetic Croissant: As I said above, the language does not require ill-formed code to "fail to compile". Compilers are simply required to issue a diagnostic. After that they are allowed to keep going on and "successfully" compile the code. However, the behavior of such code is not defined by the language spec. — AnT stands with Russia, Aug 04 '15 at 18:38
@AnT assuming you meant "does **not** require", it makes sense, thanks. — The Paramagnetic Croissant, Aug 04 '15 at 18:39
I'd love to know what the answer was your professor gave you. — Daniël W. Crompton, Aug 12 '15 at 10:07

Lightness Races in Orbit · Accepted Answer · 2015-08-04T18:39:40.193

125

The behaviour of the program is non-existent, because it is ill-formed.

char* s = "My String";

This is illegal. Prior to 2011, it had been deprecated for 12 years.

The correct line is:

const char* s = "My String";

Other than that, the program is fine. Your professor should drink less whiskey!

edited Aug 04 '15 at 18:39

answered Aug 04 '15 at 18:25

Lightness Races in Orbit

378,754
76
643
1,055

One thing that surprises me: g++ doesn't seem to warn about that: http://ideone.com/VHP6xD – Daniel Jour Aug 04 '15 at 18:28
10

with -pedantic it does : main.cpp:6:16: warning: ISO C++ forbids converting a string constant to 'char*' [-Wpedantic] – marcinj Aug 04 '15 at 18:29
4

@DanielJour: You need better warning settings. `-Wall -Wextra -pedantic` plzkthx – Lightness Races in Orbit Aug 04 '15 at 18:30
1

@DanielJour `g++` is not a conforming implementation without some of the stricter warning/error options. – The Paramagnetic Croissant Aug 04 '15 at 18:30
1

Also, it would error if you had C++11 mode on. I hear that's the default as of 5.1. Finally. – Lightness Races in Orbit Aug 04 '15 at 18:31
and I did get a warning from g++ stating "deprecated conversion from string constant to 'char*'. just wasn't sure what that meant – trungnt Aug 04 '15 at 18:31
Cannot affect compiler settings with ideone .. I was hoping that using C++14 as setting is enabling pedantic mode. Seems I was wrong. – Daniel Jour Aug 04 '15 at 18:33
1

@DanielJour one reason to use other [online compilers](http://coliru.stacked-crooked.com/a/24ea3d3253704840). Seems like `-Wall` is enough and ideone doesn't have any warning flags enabled at all. Also no error even on C++11. – AliciaBytes Aug 04 '15 at 18:34
The fact that conversion is deprecated does not make the program ill-formed. _Modifying_ that string would. – edmz Aug 04 '15 at 18:36
17

@black: No, the fact that the conversion is illegal makes the program ill-formed. It was deprecated _in the past_. We are no longer in the past. – Lightness Races in Orbit Aug 04 '15 at 18:37
3

@black For clarification, it was deprecated pre-C++11. Since C++11 it's completely illegal and should not compile according to standard. Current compilers seem to still allow it though (they probably don't want to break code). – AliciaBytes Aug 04 '15 at 18:38
17

(Which is silly because that was the purpose of the 12-year deprecation) – Lightness Races in Orbit Aug 04 '15 at 18:38
1

@LightnessRacesinOrbit Tell most modern compilers that. They still make C++03 theirs default standard. Anyways, the OP is asking about the _behavior_ of the program not whether it's ill-formed or well-formed. Indeed, its behavior is perfectly defined. – edmz Aug 04 '15 at 18:42
17

@black: The behaviour of a program that is ill-formed is _not_ "perfectly defined". – Lightness Races in Orbit Aug 04 '15 at 18:47
3

@black gcc 4 defaults to gnu++98. I think gcc 5 does also – M.M Aug 04 '15 at 23:12
11

Regardless, the question is about C++, not about some particular version of GCC. – Lightness Races in Orbit Aug 04 '15 at 23:36
2

Sounds like your professor hasn't updated his course material in the last few years... – brichins Aug 12 '15 at 04:41

score 81 · Answer 2 · edited May 23 '17 at 12:24

The answer is: it depends on what C++ standard you're compiling against. All the code is perfectly well-formed across all standards‡ with the exception of this line:

char * s = "My String";

Now, the string literal has type const char[10] and we're trying to initialize a non-const pointer to it. For all other types other than the char family of string literals, such an initialization was always illegal. For example:

const int arr[] = {1};
int *p = arr; // nope!

However, in pre-C++11, for string literals, there was an exception in §4.2/2:

A string literal (2.13.4) that is not a wide string literal can be converted to an rvalue of type “pointer to char”; [...]. In either case, the result is a pointer to the first element of the array. This conversion is considered only when there is an explicit appropriate pointer target type, and not when there is a general need to convert from an lvalue to an rvalue. [Note: this conversion is deprecated. See Annex D. ]

So in C++03, the code is perfectly fine (though deprecated), and has clear, predictable behavior.

In C++11, that block does not exist - there is no such exception for string literals converted to char*, and so the code is just as ill-formed as the int* example I just provided. The compiler is obligated to issue a diagnostic, and ideally in cases such as this that are clear violations of the C++ type system, we would expect a good compiler to not just be conforming in this regard (e.g. by issuing a warning) but to fail outright.

The code should ideally not compile - but does on both gcc and clang (I assume because there's probably lots of code out there that would be broken with little gain, despite this type system hole being deprecated for over a decade). The code is ill-formed, and thus it does not make sense to reason about what the behavior of the code might be. But considering this specific case and the history of it being previously allowed, I do not believe it to be an unreasonable stretch to interpret the resulting code as if it were an implicit const_cast, something like:

const int arr[] = {1};
int *p = const_cast<int*>(arr); // OK, technically

With that, the rest of the program is perfectly fine, as you never actually touch s again. Reading a created-const object via a non-const pointer is perfectly OK. Writing a created-const object via such a pointer is undefined behavior:

std::cout << *p; // fine, prints 1
*p = 5;          // will compile, but undefined behavior, which
                 // certainly qualifies as "unpredictable"

As there is no modification via s anywhere in your code, the program is fine in C++03, should fail to compile in C++11 but does anyway - and given that the compilers allow it, there's still no undefined behavior in it†. With allowances that the compilers are still [incorrectly] interpreting the C++03 rules, I see nothing that would lead to "unpredictable" behavior. Write to s though, and all bets are off. In both C++03 and C++11.

_{†Though, again, by definition ill-formed code yields no expectation of reasonable behavior}
_{‡Except not, see Matt McNabb's answer}

I think here "unpredictable" is intended by the professor to mean that one cannot use the standard to predict what a compiler will do with ill-formed code (beyond issuing a diagnostic). Yes, it could treat it as C++03 says it should be treated, and (at risk of the "No True Scotsman" fallacy) common sense allows us to predict with some confidence that this is the only thing a sensible compiler-writer will ever choose if the code compiles at all. Then again, it could treat it as meaning to reverse the string literal before casting it to non-const. Standard C++ doesn't care. — Steve Jessop, Aug 04 '15 at 23:12
@SteveJessop I don't buy that interpretation. This is neither undefined behavior nor of the category of ill-formed code that the standard labels as no diagnostic required. It's a simple type system violation that should be very predictable (compiles and does normal things on C++03, fails to compile on C++11). You can't really use compiler bugs (or artistic licenses) to suggest that code is unpredictable - otherwise all code would tautologically be unpredictable. — Barry, Aug 04 '15 at 23:38
I'm not talking about compiler bugs, I'm talking about whether or not the standard defines the behaviour (if any) of the code. I suspect the professor is doing the same, and "unpredictable" is just a ham-fisted way of saying that the current standard doesn't define the behaviour. Anyway that seems more likely to me, than that the professor incorrectly believes that this is a well-formed program with undefined behaviour. — Steve Jessop, Aug 04 '15 at 23:44
@SteveJessop The standard does define all of the behavior though. — Barry, Aug 04 '15 at 23:46
No, it does not. The standard does not define the behaviour of ill-formed programs. — Steve Jessop, Aug 04 '15 at 23:46
@SteveJessop: Do you like my answer about why the Standard is silent with regard to the behavior of ill-formed programs beyond the requirement of a diagnostic message? — supercat, Aug 05 '15 at 15:46
@supercat: it's a fair point, but I don't believe it's the main reason. I think the main reason the standard doesn't specify the behaviour of ill-formed programs, is so that compilers can support extensions to the language by adding syntax that is not well-formed (like Objective C does). Permitting the implementation to make a total horlicks out of cleaning up after a failed compilation is just a bonus :-) — Steve Jessop, Aug 05 '15 at 18:45
@SteveJessop: I think the Standard used to be interpreted in a fashion much more amenable to extension than it is today; the idea that an extension might do something not anticipated by the Standard is a good reason for it to leave things undefined. On the other hand, most extensions behave in relatively-predictable fashion. If failed compilation generate executable files which, though ill-formed, are not recognizable as such, the behavior of such executable files would often be much less predictable. — supercat, Aug 05 '15 at 19:03

score 20 · Answer 3 · answered Aug 04 '15 at 23:22

Other answers have covered that this program is ill-formed in C++11 due to the assignment of a const char array to a char *.

However the program was ill-formed prior to C++11 also.

The operator<< overloads are in <ostream>. The requirement for iostream to include ostream was added in C++11.

Historically, most implementations had iostream include ostream anyway, perhaps for ease of implementation or perhaps in order to provide a better QoI.

But it would be conforming for iostream to only define the ostream class without defining the operator<< overloads.

zneak · Answer 4 · 2015-08-04T18:30:15.720

The only slightly wrong thing that I see with this program is that you're not supposed to assign a string literal to a mutable char pointer, though this is often accepted as a compiler extension.

Otherwise, this program appears well-defined to me:

The rules that dictate how character arrays become character pointers when passed as parameters (such as with cout << s2) are well-defined.
The array is null-terminated, which is a condition for operator<< with a char* (or a const char*).
#include <iostream> includes <ostream>, which in turn defines operator<<(ostream&, const char*), so everything appears to be in place.

score 12 · Answer 5 · answered Aug 05 '15 at 00:00

You can't predict the behaviour of the compiler, for reasons noted above. (It should fail to compile, but may not.)

If compilation succeeds, then the behaviour is well-defined. You certainly can predict the behaviour of the program.

If it fails to compile, there is no program. In a compiled language, the program is the executable, not the source code. If you don't have an executable, you don't have a program, and you can't talk about behaviour of something that doesn't exist.

So I'd say your prof's statement is wrong. You can't predict the behaviour of the compiler when faced with this code, but that's distinct from the behaviour of the program. So if he's going to pick nits, he'd better make sure he's right. Or, of course, you might have misquoted him and the mistake is in your translation of what he said.

supercat · Answer 6 · 2015-08-05T15:45:49.320

As others have noted, the code is illegitimate under C++11, although it was valid under earlier versions. Consequently, a compiler for C++11 is required to issue at least one diagnostic, but behavior of the compiler or the remainder of the build system is unspecified beyond that. Nothing in the Standard would forbid a compiler from exiting abruptly in response to an error, leaving a partially-written object file which a linker might think was valid, yielding a broken executable.

Although a good compiler should always ensure before it exits that any object file it is expected to have produced will be either valid, non-existent, or recognizable as invalid, such issues fall outside the jurisdiction of the Standard. While there have historically been (and may still be) some platforms where a failed compilation can result in legitimate-appearing executable files that crash in arbitrary fashion when loaded (and I've had to work with systems where link errors often had such behavior), I would not say that the consequences of syntax errors are generally unpredictable. On a good system, an attempted build will generally either produce an executable with a compiler's best effort at code generation, or won't produce an executable at all. Some systems will leave behind the old executable after a failed build, since in some cases being able to run the last successful build may be useful, but that can also lead to confusion.

My personal preference would be for disk-based systems to to rename the output file, to allow for the rare occasions when that executable would be useful while avoiding the confusion that can result from mistakenly believing one is running new code, and for embedded-programming systems to allow a programmer to specify for each project a program that should be loaded if a valid executable is not available under the normal name [ideally something which which safely indicates the lack of a useable program]. An embedded-systems tool-set would generally have no way of knowing what such a program should do, but in many cases someone writing "real" code for a system will have access to some hardware-test code that could easily be adapted to the purpose. I don't know that I've seen the renaming behavior, however, and I know that I haven't seen the indicated programming behavior.

What makes this usage of pointers unpredictable?

6 Answers6

Linked