1

Given the code below the compiler emits warning C4172: returning address of local variable or temporary for the functions f1() and f2(), but not for f3(). I understand the compiler may not be able to identify this problem in certain situations as it seems to be the case with function f3() below. But how can I be sure about a correct diagnosis in this case without a warning message?

const char* const& f1() { return "hello1"; }
const char* const& f2() { return static_cast<const char*>("hello2"); }
const char* const& f3() { const char* const& r = "hello3"; return r; }
Belloc
  • 6,318
  • 3
  • 22
  • 52
  • which compiler (and version) are you using? – A. K. Jul 01 '13 at 17:10
  • I've tested this in VS2010 and GCC – Belloc Jul 01 '13 at 17:11
  • [String literals have static storage duration](http://stackoverflow.com/questions/9970295/life-time-of-string-literal-in-c/9970305#9970305), (*they are alive till the end of the program*)none of the functions have an UB. Implementations are required to provide a diagnostic for these cases, it is fortunate that they do but no you cannot blindly trust these diagnostics. – Alok Save Jul 01 '13 at 17:13
  • @Alok Save I think you're wrong on this. f1() and f2() both exhibit undefined behavior. – Belloc Jul 01 '13 at 17:14
  • clang-3.3 reports warning in all three cases – A. K. Jul 01 '13 at 17:14
  • @AdityaKumar I suppose that for a more complicated code inside the function the compiler might not be able to emit the message. – Belloc Jul 01 '13 at 17:16
  • @user1042389: Have you tried? I think the compiler should be able to figure out. – A. K. Jul 01 '13 at 17:17
  • Both `f1` and `f2` are equivalent, as a literal string already is of type `const char *`. – Some programmer dude Jul 01 '13 at 17:18
  • See $2.14.5/8 - Ordinary string literals and UTF-8 string literals are also referred to as narrow string literals. A narrow string literal has type “array of n const char”, where n is the size of the string as defined below, and has **static storage duration** – Captain Obvlious Jul 01 '13 at 17:18
  • @AlokSave `(they are alive till the end of the program)none of the functions have an UB` But the compiler has to make a conversion from a char[] to a const char* (a temporary) before returning the temporary. – Belloc Jul 01 '13 at 17:18
  • @user1042389 The function returns a pointer _value_ that _points_ to a string with _static storage duration_ – Captain Obvlious Jul 01 '13 at 17:20
  • @CaptainObvlious Exactly and this pointer is a temporary. That's where the problem is for the function f1() and f2() for sure. – Belloc Jul 01 '13 at 17:21
  • 1
    @CaptainObvlious I just realized that those functions don't return a `const char*` but a `const char* const&`, i.e. the pointer to static char array is not returned by value but by _reference_ to const! Hence the correct warning. `f3` is the same, but the use of an explicit named variable seems to trick the compiler – gx_ Jul 01 '13 at 17:39
  • @AdityaKumar `Have you tried? I think the compiler should be able to figure out` As I said before I tested the snippet in VS2010 and GCC. – Belloc Jul 01 '13 at 17:39
  • I meant, When you said:'a more *complicated code* inside the function the compiler might not be able to emit the message'. I said did you try if clang is unable to figure out. – A. K. Jul 01 '13 at 17:53
  • 1
    To answer the question in the title, the only way you can be *sure* whether something is UB is by looking it up in the ISO C++ standard. If you can find a paragraph there describing behavior that covers your code, then it is well-defined. If no such paragraph exists (or if you find one explicitly saying that it is undefined) then it us UB. Sadly, there are no easy shortcuts (although asking here might prompt responses telling you where to look in the standard) – jalf Jul 01 '13 at 20:46

2 Answers2

3

I'm convinced that all three functions have undefined behavior.

To people who insist that f3 is not UB (or even f1/f2): shall you please try to run this code:

#include <iostream>

const char* const& f1() { return "hello1"; }
const char* const& f2() { return static_cast<const char*>("hello2"); }
const char* const& f3() { const char* const& r = "hello3"; return r; }

int main()
{
    using namespace std;

//#define F f1
//#define F f2
#define F f3

    const char* const& ret = F();
    cerr << ret;
    cerr << ",";
    cerr << ret;

    return 0;
}

(I used cerr rather than cout to get immediate flushing. You can change cerr to cout and add a cout << flush; after the second output of ret.)

On my GCC here's what I got printed:

  • with f1: hello1,8??q? (some random chars after the comma)
  • with f2: hello2,8j?y5 (some random chars after the comma)
  • with f3: hello3,, (a second comma after the comma)

That looks very much like UB to me...

(Note: If I remove either const& then it "works". The const& to really remove being the one in the return type of course.)

I think that's because what happens in f3 is something like this:

const char* const& f3()
{
    const char* __tmp001 = &("hello3"[0]); // "array decaying"
    const char* const& r = __tmp001;
    return r;
}

Indeed the string literal "hello3" is not a const char*, it's a (static) const char [7]. In the code const char* const& r = "hello3";, the reference can't be bound to this char array directly because it has not the same type, so the compiler has to create a temporary char pointer (created on the stack) initialized by implicit conversion (array-to-pointer decaying) to which the reference is bound (demo). The lifetime of this temporary const char* is "extended" to the lifetime of the reference r, thus doesn't end at the first semicolon, but ends when the function returns (demo and output with all optimizations off). So f3 returns a "dangling reference". In my test output code, any subsequent operation which overwrites the stack makes the UB visible.

Edit after jalf's comment: I'm conscious that "it prints garbage on the second output" is not a proof of UB. A program with UB can as well work exactly as expected, or crash, or do nothing, or whatever. But, nonetheless, I don't believe that a well-defined program (with no UB) would print garbage like that...

gx_
  • 4,690
  • 24
  • 31
  • 3
    UB is not characterized by "random or arbitrary code behavior", but by being undefined: by there not being a definition of the behavior in the standard. So if a program prints out garbage, that doesn't prove anything in itself. It might be a hint that the code exhibits UB, but the only way to know for sure is to look it up in the spec. :) – jalf Jul 01 '13 at 20:48
  • 1
    @jalf Yes, but I'm not a language lawyer. I tried an explanation though. Also, I can legitimately argue that if it were _not_ UB then it would _not_ print garbage, can't I? – gx_ Jul 01 '13 at 20:50
  • Printing the same reference twice should not produce output like that unless there is a bug in the compiler, library or your code. Relying on that output would be the same as me relying on [this example on ideone](http://ideone.com/BAlW7B#view_edit_box) I just cooked up. `r` is an lvalue not an rvalue so where is this temporary pointer _value_ you're talking about? – Captain Obvlious Jul 01 '13 at 20:59
  • @CaptainObvlious I was thinking that array-to-pointer conversion ("decaying") was [in this spirit (not beautiful code)](http://ideone.com/SUrFdA), where we can see that `foo` returns a dangling reference (to a `Ptr` whose lifetime ended when the function returned) ([here's another output](http://pastebin.com/hnNkH4Dc) on my GCC in debug mode). Now I can't be sure for built-in arrays and pointers... As jalf said a quote from the Standard might be salutary =) – gx_ Jul 01 '13 at 21:43
  • I have deleted my answer as this is going to take more digging than I have time for today. At present it appears that the use of a temporary may be up to the implementation. From $4.2/1 An lvalue or rvalue of type array of N T or array of unknown bound of T **can** be converted to a prvalue of type pointer to T. – Captain Obvlious Jul 01 '13 at 22:56
1

Yes, the code's behavior is undefined because it returns the reference to a local pointer, which is correctly detected for f1 and f2.

You cannot rely on the compiler's diagnostics to catch these (or any other) cases of undefined behavior, they are provided on a "best effort" basis. That a compiler is easily fooled is shown by g++ 4.8.0 not warning (with -Wall) on this simple example:

int& r() {
    int x = 1;
    int& y = x;
    return y;
}

(Just returning x warns as expected, and clang warns on all four functions.)

user4815162342
  • 141,790
  • 18
  • 296
  • 355
  • 1
    Uh no, it is absolutely **not** UB. `r` is a _reference_ to a `const char` pointer, in this case a string literal with static storage duration. The return value is also a reference therefore the function returns a reference to the string literal _not_ to the local reference. If it did it would be the neatest trick in the recent history of C++ since you cannot have a reference to a reference. – Captain Obvlious Jul 01 '13 at 19:11
  • 1
    @CaptainObvlious: The string literal is not a pointer, it's an array. A reference to it creates a value of type `const char*`. The `return` statement creates a *reference* to that pointer. There's no reference to a reference; there's a reference to a pointer, i.e., to a pointer *object*, and that pointer object is a temporary. If the functions returned `const char* const` rather than `const char* const& `, there wouldnt' be a problem. (At least that's my understanding from reading the recent comments.) – Keith Thompson Jul 01 '13 at 20:56
  • @CaptainObvlious: This function exhibits the same problem: `const int& foo() { return 42; }` – Keith Thompson Jul 01 '13 at 21:26
  • @KeithThompson That's because `42` is a prvalue. From $5.1.1/1 A literal is a primary expression. Its type depends on its form (2.14). A string literal is an lvalue; all other literals are prvalues. – Captain Obvlious Jul 01 '13 at 21:34
  • @CaptainObvlious: A string literal is an lvalue *of array type*. The address of its first element (which is what it decays to) is not an lvalue. I'm not nearly as familiar with C++ as with C, but I presume it's a prvalue like `42`. – Keith Thompson Jul 01 '13 at 21:37
  • @CaptainObvlious The storage the pointer points to is of static duration, but the pointer itself is dynamic. Here is another example: `char*& foo() { char*& x = new char[10]; return x; }`. It is irrelevant that the storage is dynamically allocated, the *pointer* that points to that storage is autoallocated and the reference to that pointer must not be returned from the function. – user4815162342 Jul 01 '13 at 21:44
  • Also, the function doesn't return a "reference to the reference", but a reference to the pointer, whose storage class (automatic) has nothing to do with the storage class of the data it points to (static). – user4815162342 Jul 01 '13 at 21:46