Do we sometimes have to write code that has undefined behavior according to the C++ Standard?

Question

In regard to C++ Standard:

Does std::function of GNU Compiler Collection use union data type to cast between different function pointer types (e.g. to convert non-static member function pointer to non-member function pointer)? ~~I think so.~~ EDIT: It uses union data type but no cast is made (type-erasure).
Is it an undefined behavior to cast between different function pointer types (in C++ or C++11 Standard)? I think so.
Is it possible to implement a std::function without using any code which has an undefined behavior? ~~I don't think so.~~ I'm talking about this.

The following is my question:

Do we sometimes have to write code that has undefined behavior according to the C++ Standard (but they have defined behavior for particular C++ compilers such as GCC or MSVC)?

Does it mean that we can't/shouldn't prevent undefined behavior of our C++ codes?

I **highly** doubt `std::function` cannot be implemented. And I would personally recommend never writing code that is undefined by The Standard, even if a particular platform guarantees a specific behaviour. But it is occasionally useful in that case, though I've yet to be convinced it is *necessary*. — BoBTFish, Jul 09 '14 at 08:23
@BoBTFish It depends on the context, and what undefined behavior we're dealing with. There's a lot of necessary functionality which isn't defined in the C++ standard (but is, for example, in Posix, or in the Windows specification). — James Kanze, Jul 09 '14 at 08:26
Besides everything else, the standard library code is free to exploit whatever platform-specific behavior it wants, all it has to guarantee is that externally it will behave as specified by the standard. So, even if inside the library code there's stuff that is not portable, you are not invoking UB by using it. — Matteo Italia, Jul 09 '14 at 08:30
Have a look at the implementation of libcxx: http://llvm.org/svn/llvm-project/libcxx/trunk/include/functional — TNA, Jul 09 '14 at 08:31
I don't believe you can write drivers that access hardware without using something that the standard calls undefined, since it, at some point or another, most likely involves making pointers to memory that hasn't been allocated through the normal allocation routines mentioned by the standard - and such memory accesses are undefined... ;) — Mats Petersson, Jul 09 '14 at 08:32
Also, a cursory read of the linked code doesn't show any usage of `union` to perform type-punning, the unions are used as intended - you read back only the member you last wrote (hence the `m_flag` field). — Matteo Italia, Jul 09 '14 at 08:32
@MatteoItalia: There is a `struct _LIBCPP_TYPE_VIS_ONLY hash`, which uses a union for type-punning (xoring integer values when a `long double` was stored). But not for the actual function types. — Mats Petersson, Jul 09 '14 at 08:35
@MatsPetersson: I'm talking about the code he linked in the question ([this thing](http://stackoverflow.com/questions/3534812/how-does-the-template-parameter-of-stdfunction-work-implementation)), where I don't see any type punning... — Matteo Italia, Jul 09 '14 at 08:37
"Undefined Behavior" is deemed _undefined_ by the C++ standard. It does **not** mean that gcc is forbidden from _defining_ the behavior. gcc's `std::function` is therefore _not portable_. — Drew Dormann, Jul 09 '14 at 08:38
I think undefined behavior's "necessity" comes more often from its performance benefits rather than because of actual necessity (but this does not apply in all cases). — user541686, Jul 09 '14 at 08:39
No, we don't have to write "codes" with undefined behavior. Codes are things that cryptologists make and break. We write **code**. If your boss tells you to write code with undefined behaviour, or else, you go ahead and write it. Or you say no and pack your belongings and get escorted out of the campus. Does it make you "have to" write such code? I guess that depends on what you mean by "have to". Standard C++ is Turing-complete so everything is doable without UB. Sometimes one resorts to UB for squeezing that last drop of performance out of the code, but other than that... — n. m. could be an AI, Jul 09 '14 at 08:39
@n.m.: careful with Turing completeness... you can do everything a Turing machine can do with a Turing complete language, but a Turing machine cannot display windows on-screen, talk to a webserver, replace an interrupt table, ... So if all you need is to calculate computable numbers, Turing machines are ok, but most interesting things you can do with a computer need something more - not in terms of computability, but of interaction with the environment, and some interactions may *require* what the standard would call UB (see e.g. the POSIX sockets API). — Matteo Italia, Jul 09 '14 at 08:48
"Does std::function of GNU Compiler Collection use union data type to cast between different function pointer types (e.g. to convert non-static member function pointer to non-member function pointer)? I think so." Absolutely not. It uses type erasure. `std::function` is *not* a wrapper of a function pointer, it can store function objects with a state. — sbabbi, Jul 09 '14 at 08:50
@MatteoItalia a Turing machine can do all of these things. They are all strings of zeroes and ones. You just need to connect it to a suitable display device or a network interface. — n. m. could be an AI, Jul 09 '14 at 09:34
@MatteoItalia I'm not sure why POSIX sockets require you to use UB. Do you mean casting between different sockaddr pointers? It's not UB. But in general, yes, third-party interfaces to the outside world may be inherently bad and require dirty tricks. The C++ standard library OTOH should not require any, or there's a bug in the standard. — n. m. could be an AI, Jul 09 '14 at 09:47
@n.m.: ... which may not be the display device or network interface API you have to talk with :-) As for the `sockaddr` pointers, isn't it against the strict aliasing rules? — Matteo Italia, Jul 09 '14 at 09:49
@MatteoItalia If your desktop cannot directly talk to a terabit/sec fiber optic switch, you don't blame C++, right? TM is a language. The standard allows you to cast a pointer to a different type and back to original, preserving the value. Strict aliasing rules prohibit something else. — n. m. could be an AI, Jul 09 '14 at 10:02
@n.m.: of course I'm not blaming the language, I simply disagree with that the fact that a language being TC means that it "can do everything"; there can be (and often there *are*) technological obstacles in interaction with the environment. For the `sockaddr` issue, I vaguely remember that you had to perform strange casts and access them using a pointer of an unrelated type, but I may be wrong; anyhow, that was just an example, as you said there's plenty of APIs which require dirty tricks (e.g. in Win32 you are converting between pointers and integers all the time). — Matteo Italia, Jul 09 '14 at 10:16

score 42 · Accepted Answer · edited Jul 09 '14 at 08:50

Nobody forces you to write anything, so nobody forces you to write code that invokes UB.

As for the standard library, its code is free to contain any nonportable behavior it wants - heck, it may even be written in another language with the compiler creating the bindings via magical unicorns; all that matters is that it behaves according to specification.

Come to think of it, it's obvious that at some level the standard library will have to go outside the standard - making syscalls, talking with hardware, ... is not even contemplated by the standard, and is often deeply platform-specific. For example, on 64 bit Linux you can perform syscalls with inline assembly (via the sysenter instruction) - the standard does not forbid this, it just doesn't mandate that every platform must behave like this.

As for the specific example, I don't see any UB - the unions there are used as specified by the standard - i.e. reading only from the last member you wrote into (hence the field m_flag).

Thanks, I forgot about how my example has been working (m_flag for first example and function overloading for second example). — Sadeq, Jul 09 '14 at 09:57

n. m. could be an AI · Answer 2 · 2014-07-09T11:50:40.687

Why is this ever interesting? It could be implemented in terms of __gnu_cplusplus_builtin_std_function__ for all we know.
No, the standard explicitly permits that.
Definitely yes, with any number of fully standard-conforming techniques.

The rest of the question is ill-posed, addressed in a comment.

Here's a rudimentary mock-up of std::function on the back of an envelope, with no casts or unions or AFAICT anything remotely dangerous. Of course not all features of real std::function either, but that's just a matter of some technical work.

#include <memory>
#include <iostream>
#include <type_traits>

template <typename R, typename ... Args>
struct CallBase
{
  virtual R operator()(Args... args) = 0;
  virtual ~CallBase() {}
};

template <typename R, typename ... Args>
struct FunCall : CallBase<R, Args...>
{
  virtual R operator()(Args... args) { return f(args...); }
  R(*f)(Args...);
  FunCall(R f(Args...)) : f(f) {}
};

template <typename Obj, typename R, typename ... Args>
struct ObjCall : CallBase<R, Args...>
{
  virtual R operator()(Args... args) { return o(args...); }
  Obj o;
  ObjCall(Obj o) : o(o) {}
};

template <typename R, typename ... Args> struct MemFunCall;
template <typename R, typename Cl, typename ... Args>
struct MemFunCall<R, Cl, Args...> : CallBase<R, Cl, Args...>
{
  typedef typename std::remove_reference<Cl>::type Rcl;
  virtual R operator()(Cl c, Args... args) { return (c.*f)(args...); }
  R (Rcl::*f)(Args...);
  MemFunCall(R (Rcl::*f)(Args...)) : f(f) {}
};


template <typename Fn> class Function;
template <typename R> struct Function<R()>
{
  std::unique_ptr<CallBase<R>> fn;
  R operator()() { return (*fn)(); }
  Function(R (*f)()) : fn(new FunCall<R>(f)) {}
  template<typename Obj>
  Function(Obj o) : fn(new ObjCall<Obj, R>(o)) {}
};

template <typename R, typename Arg1, typename ... Args> 
struct Function<R(Arg1, Args...)>
{
  std::unique_ptr<CallBase<R, Arg1, Args...>> fn;
  R operator()(Arg1 arg1, Args... args) { return (*fn)(arg1, args...); }
  Function(R (*f)(Arg1 arg1, Args...)) :
    fn(new FunCall<R, Arg1, Args...>(f)) {}
  template<typename T>
  Function(R (T::*f)(Args...)) : 
    fn(new MemFunCall<R, Arg1, Args...>(f)) {}
  template<typename Obj>
  Function(Obj o) : fn(new ObjCall<Obj, R, Arg1, Args...>(o)) {}
};

struct Foo
{
  static void bar (int a) { std::cout << "bar " << a << std::endl; }
  int baz (const char* b) { std::cout << "baz " << b << std::endl; return 0; }
  void operator()(double x) { std::cout << "operator() " << x << std::endl; }

};

int main ()
{
  Function<void(int)> f1(&Foo::bar);
  f1(3);
  Foo foo;
  Function<int(Foo&, const char*)> f2(&Foo::baz);
  f2(foo, "whatever");
  Function<void(double)> f3(foo);
  f3(2.75);
}

But you are using inheritance which has lower performance in comparison to union (no casts!) and function overloading. — Sadeq, Jul 15 '14 at 18:16
There is not a single word about performance in the question. If you have specific performance goals, state them explicitly. — n. m. could be an AI, Jul 15 '14 at 19:54

score 5 · Answer 3 · edited Apr 13 '17 at 12:40

Does std::function of GNU Compiler Collection use union data type to cast between different function pointer types (e.g. to convert non-static member function pointer to non-member function pointer)? I think so.

No, it uses type-erasure. It is a kind of variant for function objects.

Is it an undefined behavior to cast between different function pointer types (in C++ or C++11 Standard)? I think so.

It does not have to be, you can cast a function pointer to another function pointer and a member function pointer to another member function pointer.

Is it possible to implement a std::function without using any code which has an undefined behavior? I don't think so. I'm talking about this.

Yes you can, there are plenty examples out there, it is really easy, doing so will teach you a lot about c++:

http://probablydance.com/2013/01/13/a-faster-implementation-of-stdfunction/ https://codereview.stackexchange.com/questions/14730/impossibly-fast-delegate-in-c11 http://avdgrinten.wordpress.com/2013/08/07/c-stdfunction-with-the-speed-of-a-macro/

But, in time, you'll see that the ability to think and model at a high-level is more important than knowing the details of a specific language.

score 4 · Answer 4 · answered Jul 09 '14 at 08:36

In some rare cases it can be voluntary to exploit an undefined-behavior, that is when all the following conditions are met:

you are certain that the code is not supposed to be compiled on other platforms/using other compilers (this happen but is very rare, as you really never know what will happen with your code in the future);
you are using a very specific compiler, with a very specific version, with very specific compilation flags targeting one very specific platform (yeah sure...);
your compiler, under these conditions, specify in it's documentation what is the behaviour of that specific undefined-behaviour code you have at hand (for example it might call terminate() even if the standard say that it's undefined behaviour, or even do something useful, as it's allowed by undefined-behaviour from the standard point of view);

Basically, if your compiler does document the behavior when the standard says it's undefined-bahaviour, you can rely on it if you are not writting portable code.

Of course compilers, compilation flags, code and target platforms change through time so it's rarely a good idea. The important thing to understand is that C++ standard only define what a cross-platform C++ code is allowed to look like. It doesn't specify more (it don't specify implementation for example) and just specify where it can't specify a portable behavior.

So if you write portable code, or code that follow the standard, you never have to exploit undefined behaviour.

Also, most undefined behaviour are just usage errors that are costly to to check at runtime (like checking array boundaries, you do it if you want but the standard will not force the compilers to do it). Therefore, it might be in some case useful to add checks to avoid an undefined behaviour but it's better to just not allow that code to be written in the beginning. It's one of the reasons strong type checking is helpful with big codebases and also why static analysis is getting a lot of attention these days. They do help prevent even compiling some code that can be checked at compile time that they are going to be problematic.

A more interesting historical example for point #3 might be code that performs wrapping computations on signed integers. Many older compilers did in fact explicitly specify that code could legitimately rely upon wrapping behavior, and the compiler would not make any optimizations based upon a presumption that numbers couldn't wrap. — supercat, Jul 09 '14 at 17:14

score 2 · Answer 5 · answered Jul 09 '14 at 10:27

The C++ standard library is part of the implementation. So the C++ standard library may contain source code that would have undefined behaviour if you wrote it in your own user code, but the implementation guarantees that it works as defined by the C++ standard. If it doesn't work as defined, that's not undefined behaviour, that is a bug in the implementation.

Code doesn't just have "undefined behaviour", it has "behaviour that is not defined by the C++ standard". For example, there are tons of Posix functions that are defined by the Posix standard. If your implementation says "this implementation follows the C++ Standard and the Posix Standard", then using Posix functions that don't have behaviour defined by the C++ Standard is fine, because they have defined behaviour on your implementation (possibly not on another that is not Posix compatible).

And you may have heard that undefined behaviour could format your hard drive (and other nasty things). But the other way round, since "formatting hard drives" is not mentioned anywhere in the C++ Standard, if formatting your hard drive is what your program is actually supposed to do, then you will have to do something that is undefined behaviour according to the C++ standard.

Obviously you'd need some pretty good reason to invoke undefined behaviour in your code. Because of the obvious dangers (different behaviours possible, nasty behaviour possible with optimising compilers, huge portability problems), any undefined behaviour without very strong justification is a very bad sign.

score 1 · Answer 6 · edited May 23 '17 at 12:25

1

The very fastest at-face-value implementations of the fast inverse square root involve undefined behaviour. A more compliant implementation may require an additional copy. Given that the raison d'être of the fast inverse square root lies in being fast, this may just qualify as "need" depending on the situation. However, modern optimizers are capable of such sorcery that I wouldn't be surprised if the compliant version was quietly transformed into the optimal form.

edited May 23 '17 at 12:25

Community

1
1

answered Jul 09 '14 at 16:32

DeveloperInDevelopment

171
5

2

@Down-voter: I can't improve/correct/delete the answer if you don't say what is wrong with it... – DeveloperInDevelopment Jul 09 '14 at 20:28

score 1 · Answer 7 · answered Jul 09 '14 at 18:31

Every program using the POSIX API for accessing dynamically linked functions must rely on behaviour undefined by the C++ standard: specifically converting the void* returned by dlsym to a function pointer. Of course, "works as expected" is one way to implement undefined behaviour, and the POSIX standard mandates the cast is well-defined on all compliant platforms.

Do we sometimes have to write code that has undefined behavior according to the C++ Standard?

7 Answers7