104

(I'm looking for an example or two to prove the point, not a list.)

Has it ever been the case that a change in the C++ standard (e.g. from 98 to 11, 11 to 14 etc.) changed the behavior of existing, well-formed, defined-behavior user code - silently? i.e. with no warning or errors when compiling with the newer standard version?

Notes:

  • I'm asking about standards-mandated behavior, not about implementer/compiler author choices.
  • The less contrived the code, the better (as an answer to this question).
  • I don't mean code with version detection such as #if __cplusplus >= 201103L.
  • Answers involving the memory model are fine.
einpoklum
  • 118,144
  • 57
  • 340
  • 684
  • Comments are not for extended discussion; this conversation has been [moved to chat](https://chat.stackoverflow.com/rooms/219471/discussion-on-question-by-einpoklum-have-there-ever-been-silent-behavior-changes). – Samuel Liew Aug 09 '20 at 03:37
  • In my mind, the biggest silent breaking change is the redefinition of `auto`. Before C++11, `auto x = ...;` declared an `int`. After, it declares whatever `...` is. – Raymond Chen Aug 21 '20 at 06:08
  • @RaymondChen: This change is only silent if you were implicitly defining int's, but explicitly saying the were `auto`-type variables. I think you could probably count on one hand the number of people in the world who would write that kind of code, except for the obfuscated C code contests... – einpoklum Aug 21 '20 at 11:04
  • True, that's why they chose it. But it was a huge change in semantics. – Raymond Chen Aug 21 '20 at 14:28

9 Answers9

114

The return type of string::data changes from const char* to char* in C++ 17. That could certainly make a difference

void func(char* data)
{
    cout << data << " is not const\n";
}

void func(const char* data)
{
    cout << data << " is const\n";
}

int main()
{
    string s = "xyz";
    func(s.data());
}

A bit contrived but this legal program would change its output going from C++14 to C++17.

scohe001
  • 15,110
  • 2
  • 31
  • 51
john
  • 85,011
  • 4
  • 57
  • 81
  • 7
    Oh, I didn't even realize the were `std::string` changes for C++17. If anything, I would have thought the C++11 changes might have caused silent behavior change somehow. +1. – einpoklum Aug 06 '20 at 20:46
  • 9
    Contrived or not, this demonstrates a change to well-formed code quite well. – David C. Rankin Aug 06 '20 at 21:54
  • As an aside, the change is based on funny but legitimate use cases when you change a std::string's contents *in situ,* perhaps through legacy functions operating on char *. That's totally legitimate now: as with a vector, there is a guarantee that there is an underlying, contiguous array which you can manipulate (you always could through returned references; now it's made more natural and explicit). Possible use cases are editable, fixed-length data sets (e.g. messages of some kind) which, if based on a std:: container , retain the STL's services like life time management, copyability etc. – Peter - Reinstate Monica Aug 09 '20 at 10:57
82

The answer to this question shows how initializing a vector using a single size_type value can result in different behavior between C++03 and C++11.

std::vector<Something> s(10);

C++03 default-constructs a temporary object of the element type Something and copy-constructs each element in the vector from that temporary.

C++11 default-constructs each element in the vector.

In many (most?) cases these result in equivalent final state, but there is no reason they have to. It depends on the implementation of Something's default/copy constructors.

See this contrived example:

class Something {
private:
    static int counter;

public:
    Something() : v(counter++) {
        std::cout << "default " << v << '\n';
    }

    Something(Something const & other) : v(counter++) {
        std::cout << "copy " << other.v << " to " << v << '\n';
    }

    ~Something() {
        std::cout << "dtor " << v << '\n';
    }

private:
    int v;
};

int Something::counter = 0;

C++03 will default-construct one Something with v == 0 then copy-construct ten more from that one. At the end, the vector contains ten objects whose v values are 1 through 10, inclusive.

C++11 will default-construct each element. No copies are made. At the end, the vector contains ten objects whose v values are 0 through 9, inclusive.

cdhowie
  • 158,093
  • 24
  • 286
  • 300
  • @einpoklum I added a contrived example, though. :) – cdhowie Aug 06 '20 at 21:40
  • 3
    I don't think it's contrived. Different constructors often act differently w.r.t. things like, say, memory allocation. You just replaced one side effect with another (I/O). – einpoklum Aug 06 '20 at 21:44
  • 17
    @cdhowie Not contrived at all. I was recently working on a UUID class. The default constructor generated a random UUID. I had no idea about this possibility, I just assumed the C++11 behaviour. – john Aug 07 '20 at 05:09
  • 5
    One widely used real world example of class where this would matter is OpenCV `cv::mat`. The default constructor allocates new memory, while the copy constructor creates a new view to existing memory. – jpa Aug 08 '20 at 05:42
  • I wouldn't call that a contrived example, it clearly demonstrates the difference in behavior. – David Waterworth Aug 09 '20 at 01:20
52

The standard has a list of breaking changes in Annex C [diff]. Many of these changes can lead to silent behavior change.

An example:

int f(const char*); // #1
int f(bool);        // #2

int x = f(u8"foo"); // until C++20: calls #1; since C++20: calls #2
cpplearner
  • 13,776
  • 2
  • 47
  • 72
  • 7
    @einpoklum Well, at least a dozen of them are said to "change meaning" of existing code or make them "execute differently". – cpplearner Aug 07 '20 at 12:20
  • 4
    How would you summarize the rationale for this particular change? – Nayuki Aug 07 '20 at 21:04
  • 4
    @Nayuki pretty sure it using the `bool` version was not an intended change per se, just a side-effect of other conversion rules. The real intention would be to stop some of the confusion between character encodings, the actual change being that `u8` literals used to give `const char*` but now give `const char8_t*`. – leftaroundabout Aug 08 '20 at 12:31
25

Every time they add new methods (and often functions) to the standard library this happens.

Suppose you have a standard library type:

struct example {
  void do_stuff() const;
};

pretty simple. In some standard revision, a new method or overload or next to anything is added:

struct example {
  void do_stuff() const;
  void method(); // a new method
};

this can silently change the behavior of existing C++ programs.

This is because C++'s currently limited reflection capabilities are sufficient to detect if such a method exists, and run different code based on it.

template<class T, class=void>
struct detect_new_method : std::false_type {};

template<class T>
struct detect_new_method< T, std::void_t< decltype( &T::method ) > > : std::true_type {};

this is just a relatively simple way to detect the new method, there are myriad of ways.

void task( std::false_type ) {
  std::cout << "old code";
};
void task( std::true_type ) {
  std::cout << "new code";
};

int main() {
  task( detect_new_method<example>{} );
}

The same can happen when you remove methods from classes.

While this example directly detects the existence of a method, this kind of thing happening indirectly can be less contrived. As a concrete example, you might have a serialization engine that decides if something can be serialized as a container based on if it is iterable, or if it has a data pointing-to-raw-bytes and a size member, with one preferred over the other.

The standard goes and adds a .data() method to a container, and suddenly the type changes which path it uses for serialization.

All the C++ standard can do, if it doesn't want to freeze, is to make the kind of code that silently breaks be rare or somehow unreasonable.

Yakk - Adam Nevraumont
  • 262,606
  • 27
  • 330
  • 524
  • 3
    I should have qualified the question to exclude SFINAE because this isn't quite what I meant... but yes, that's true, so +1. – einpoklum Aug 07 '20 at 20:33
  • "this kind of thing happening indirectly" resulted in a upvote rather then a downvote as it is a real trap. – Ian Ringrose Aug 08 '20 at 13:13
  • 1
    This is a really good example. Even though OP meant to exclude it, this is probably one of the _most_ likely things to cause silent behavior changes to existing code. +1 – cdhowie Aug 09 '20 at 17:22
  • 1
    @TedLyngmo If you can't fix the detector, change the thing detected. Texas sharpshooting! – Yakk - Adam Nevraumont Aug 14 '20 at 13:43
15

Oh boy... The link cpplearner provided is scary.

Among others, C++20 disallowed C-style struct declaration of C++ structs.

typedef struct
{
  void member_foo(); // Ill-formed since C++20
} m_struct;

If you were taught writing structs like that (and people that teach "C with classes" teach exactly that) you're screwed.

Peter Mortensen
  • 30,738
  • 21
  • 105
  • 131
Noone AtAll
  • 375
  • 1
  • 10
  • 20
    Whoever taught that should write 100 times on the blackboard "I shall not typedef structs". You shouldn't even do it in C, imho. Anyway, that change is not silent: In the new standard, ["Valid C++ 2017 code (using typedef on anonymous, non-C structs) may be ill-formed"](http://eel.is/c++draft/diff#cpp17.dcl.dcl) and ["ill-formed - the program has syntax errors or diagnosable semantic errors. A conforming C++ compiler is required to issue a diagnostic"](https://en.cppreference.com/w/cpp/language/ub). – Peter - Reinstate Monica Aug 07 '20 at 14:30
  • 19
    @Peter-ReinstateMonica Well, I always `typedef` my structs, and I'm most certainly not going to waste my chalk on it. This is most definitely a matter of taste, and while there are highly influential people (Torvalds...) that share your point of view, other people like myself will point out, that a naming convention for types is all that's needed. Cluttering the code with `struct` keywords adds little to the understanding that a capital letter (`MyClass* object = myClass_create();`) won't convey. I respect it if you want the `struct` in your code. But I don't want it in mine. – cmaster - reinstate monica Aug 07 '20 at 14:51
  • 5
    That said, when programming C++, it is indeed a good convention to use `struct` only for plain-old-data types, and `class` anything that has member functions. But you can't use that convention in C as there's no `class` in C. – cmaster - reinstate monica Aug 07 '20 at 14:53
  • @cmaster Sure, with respect to C it was only a humble opinion; with C++ we appear to agree. As an aside, what members other than PODs would you have in C anyway? Certainly no member functions which may be one of the reasons there is no class keyword ;-) – Peter - Reinstate Monica Aug 07 '20 at 15:00
  • 1
    @Peter-ReinstateMonica Yeah, well you can't attach a method syntactically in C, but that doesn't mean a C `struct` is actually POD. The way I write C code, most structures are only touched by code in a single file and by functions which carry the name of their class. It's basically OOP without the syntactic sugar. This allows me to actually control what changes inside a `struct`, and which invariants are guaranteed between its members. So, my `structs` tend to have member functions, private implementation, invariants, and abstract from their data members. Does not sound like POD, does it? – cmaster - reinstate monica Aug 07 '20 at 15:31
  • @cmaster The *semantics* are C++, member functions, constructors, invariants etc. The implementation is C with PODs. This means you can always memcpy them without ill effects *language-side* (which you cannot do with C++ non-POD classes -- I think). That an invariant is violated etc. is pure semantics. – Peter - Reinstate Monica Aug 07 '20 at 15:36
  • 6
    As long as they're not forbidden in `extern "C"` blocks, I don't see any issue with this change. Nobody should be typedefing structs in C++. This is no bigger hurdle than the fact that C++ has different semantics than Java. When you learn a new programming language, you might need to learn some new habits. – Cody Gray - on strike Aug 07 '20 at 21:33
  • 1
    Interesting discussion, but this doesn't really answer the question. – einpoklum Aug 08 '20 at 10:14
  • @CodyGray True, each language has its own habits and best practices. Nevertheless, one of the best practices of writing libraries, is to offer a C API. Because everbody can call into C libraries (Python, Fortran, whatever), but only C++ can call into a C++ API (as far as I'm aware of). And such C APIs absolutely need forward declarations of the opaque structs. Of course, you can do that without a `typedef`, but the canonical way is to do `typedef struct Foo Foo;` – cmaster - reinstate monica Aug 09 '20 at 20:13
  • @cmaster When you're writing a C API, you're writing in C, not C++. And you'd be wrapping it in an `extern "C"` block (conditioned on the definition of `__cplusplus`), so, as I said in my previous comment, this change to the C++ language standard should not be a problem (I feel it is a positive improvement), so long as they're not forbidden in `extern "C"` blocks. – Cody Gray - on strike Aug 11 '20 at 22:07
  • @CodyGray So our disagreement is just about the definition what language the code is written in. I have used the definition (which is backed by the standard, btw.) that the stuff within the `extern "C"` block is both C and C++. You are using a definition that does not allow any line of code to conform to more than one language, and decide to call the contents of `extern "C"` C code, only. I prefer to call it both C and C++ because this code can be parsed as C, and it can be parsed as C++, with the added feature that the semantics in both are defined to be compatible. – cmaster - reinstate monica Aug 12 '20 at 10:30
  • @CodyGray Or, more concisely: If a conforming C compiler compiles it without error, then it's C. If a conforming C++ compiles it without error, then its C++. If both compilers compile it without errors, then it's both. The later is what the `extern "C"` construct is designed to do. – cmaster - reinstate monica Aug 12 '20 at 10:32
  • `extern "C"` *changes the semantics*. Code in such a block is valid C++, but only because it's valid C, and C++ defines the semantics of `extern "C"` blocks to allow valid C code... Anything else seems like excessive pedantry. – Cody Gray - on strike Aug 15 '20 at 11:45
  • This is not an answer to my question. Something becoming ill-formed is not a silent change. – einpoklum Aug 18 '20 at 14:37
15

Here's an example that prints 3 in C++03 but 0 in C++11:

template<int I> struct X   { static int const c = 2; };
template<> struct X<0>     { typedef int c; };
template<class T> struct Y { static int const c = 3; };
static int const c = 4;
int main() { std::cout << (Y<X< 1>>::c >::c>::c) << '\n'; }

This change in behavior was caused by special handling for >>. Prior to C++11, >> was always the right shift operator. With C++11, >> can be part of a template declaration, too.

Waxrat
  • 510
  • 2
  • 11
  • Well, technically this is true, but this code was "informally ambiguous" to begin with due to the use of `>>` that way. – einpoklum Aug 18 '20 at 14:32
11

Trigraphs dropped

Source files are encoded in a physical character set that is mapped in an implementation-defined way to the source character set, which is defined in the standard. To accommodate mappings from some physical character sets that didn't natively have all of the punctuation needed by the source character set, the language defined trigraphs—sequences of three common characters that could be used in place of a less common punctuation character. The preprocessor and compiler were required to handle these.

In C++17, trigraphs were removed. So some source files will not be accepted by newer compilers unless they are first translated from the physical character set to some other physical character set that maps one-to-one to the source character set. (In practice, most compilers just made interpretation of trigraphs optional.) This isn't a subtle behavior change, but a breaking change the prevents previously-acceptable source files from being compiled without an external translation process.

More constraints on char

The standard also refers to the execution character set, which is implementation defined, but must contain at least the entire source character set plus a small number of control codes.

The C++ standard defined char as a possibly-unsigned integral type that can efficiently represent every value in the execution character set. With the representation from a language lawyer, you can argue that a char has to be at least 8 bits.

If your implementation uses an unsigned value for char, then you know it can range from 0 to 255, and is thus suitable for storing every possible byte value.

But if your implementation uses a signed value, it has options.

Most would use two's complement, giving char a minimum range of -128 to 127. That's 256 unique values.

But another option was sign+magnitude, where one bit is reserved to indicate whether the number is negative and the other seven bits indicate the magnitude. That would give char a range of -127 to 127, which is only 255 unique values. (Because you lose one useful bit combination to represent -0.)

I'm not sure the committee ever explicitly designated this as a defect, but it was because you couldn't rely on the standard to guarantee a round-trip from unsigned char to char and back would preserve the original value. (In practice, all implementations did because they all used two's complement for signed integral types.)

Only recently (C++17?) was the wording fixed to ensure round-tripping. That fix, along with all the other requirements on char, effectively mandates two's complement for signed char without saying so explicitly (even as the standard continues to allow sign+magnitude representations for other signed integral types). There's a proposal out to require all signed integral types use two's complement, but I don't recall whether it made it into C++20.

So this one is sort of the opposite of what you're looking for because it gives previously incorrect overly presumptuous code a retroactive fix.

Cody Gray - on strike
  • 239,200
  • 50
  • 490
  • 574
Adrian McCarthy
  • 45,555
  • 16
  • 123
  • 175
  • The trigraphs part is not an answer to this question - that's not a silent change. And, IIANM, the second part is a change of implementation-defined to strictly-mandated behavior, which is also not what I asked about. – einpoklum Aug 18 '20 at 14:34
10

I'm not sure if you'd consider this a breaking change to correct code, but ...

Before C++11, compilers were allowed, but not required, to elide copies in certain circumstances, even when the copy constructor has observable side effects. Now we have guaranteed copy elision. The behavior essentially went from implementation-defined to required.

This means that your copy constructor side effects may have occurred with older versions, but will never occur with newer ones. You could argue the correct code shouldn't rely on implementation-defined results, but I don't think that's quite the same as saying such code is incorrect.

Adrian McCarthy
  • 45,555
  • 16
  • 123
  • 175
  • 1
    I thought this "requirement" was added in C++17, not C++11? (See [temporary materialization](https://en.cppreference.com/w/cpp/language/implicit_conversion#Temporary_materialization).) – cdhowie Aug 09 '20 at 17:23
  • @cdhowie: I think you're right. I didn't have the standards onhand when I wrote this and I probably put too much trust in some of my search results. – Adrian McCarthy Aug 11 '20 at 04:15
  • A change to implementation-defined behavior doesn't count as an answer to this question. – einpoklum Aug 18 '20 at 14:36
7

The behavior when reading (numeric) data from a stream, and reading fails, was changed since c++11.

For example, reading an integer from a stream, while it does not contain an integer:

#include <iostream>
#include <sstream>

int main(int, char **) 
{
    int a = 12345;
    std::string s = "abcd";         // not an integer, so will fail
    std::stringstream ss(s);
    ss >> a;
    std::cout << "fail = " << ss.fail() << " a = " << a << std::endl;        // since c++11: a == 0, before a still 12345 
}

Since c++ 11 will set the read integer to 0 when it failed; at c++ < 11 the integer was not changed. That said, gcc, even when forcing the standard back to c++98 (with -std=c++98 ) always shows new behavior at least since version 4.4.7.

(Imho the old behavior was actually better: why change the value to 0, which is by itself valid, when nothing could be read?)

Reference: see https://en.cppreference.com/w/cpp/locale/num_get/get

DanRechtsaf
  • 518
  • 3
  • 8
  • But there is no change mentioned about returnType. Only 2 news overload available since C++11 – Build Succeeded Aug 20 '20 at 13:58
  • Was this defined behavior both in C++98 and in C++11? Or did the behavior become defined? – einpoklum Aug 20 '20 at 14:02
  • When cppreference.com is right: "if an error occurs, v is left unchanged. (until C++11)" So behavior was defined before C++11, and changed. – DanRechtsaf Aug 21 '20 at 20:38
  • To my understanding, the behaviour for ss > a was indeed defined, but for the very common case where you are reading to an uninitialized variable, the c++ 11 behaviour will use an uninitialized variable, which is undefined behaviour. Thus default-construction on failiure guards against a very common undefined behaviour. – Rasmus Damgaard Nielsen Aug 23 '20 at 11:53