How to force return value optimization in msvc

Question

I have a function in a class that I want the compiler to use NRVO on...all the time...even in debug mode. Is there a pragma for this?

Here is my class that works great in "release" mode:

template <int _cbStack> class CBuffer {
public:
    CBuffer(int cb) : m_p(0) { 
        m_p = (cb > _cbStack) ? (char*)malloc(cb) : m_pBuf;
    }
    template <typename T> operator T () const { 
        return static_cast<T>(m_p); 
    }
    ~CBuffer() { 
        if (m_p && m_p != m_pBuf) 
            free(m_p); 
    }
private: 
    char *m_p, m_pBuf[_cbStack];
};

The class is used to make a buffer on the stack unless more than _cbStack bytes are required. Then when it destructs, it frees memory if it allocated any. It's handy when interfacing to c functions that require a string buffer, and you are not sure of the maximum size.

Anyway, I was trying to write a function that could return CBuffer, like in this test:

#include "stdafx.h"
#include <malloc.h>
#include <string.h>

template <int _cbStack> CBuffer<_cbStack> foo() 
{ 
    // return a Buf populated with something...
    unsigned long cch = 500;
    CBuffer<_cbStack> Buf(cch + 1);
    memset(Buf, 'a', cch);  
    ((char*)Buf)[cch] = 0;
    return Buf;
}

int _tmain(int argc, _TCHAR* argv[])
{
    auto Buf = foo<256>();
    return 0;
}

I was counting on NRVO to make foo() fast. In release mode, it works great. In debug mode, it obviously fails, because there is no copy constructor in my class. I don't want a copy constructor, since CBuffer will be used by developers who like to copy everything 50 times. (Rant: these guys were using a dynamic array class to create a buffer of 20 chars to pass to WideCharToMultiByte(), because they seem to have forgotten that you can just allocate an array of chars on the stack. I don't know if they even know what the stack is...)

I don't really want to code up the copy constructor just so the code works in debug mode! It gets huge and complicated:

template <int _cbStack> 
class CBuffer {
public:
    CBuffer(int cb) : m_p(0) { Allocate(cb); }
    CBuffer(CBuffer<_cbStack> &r) { 
        int cb = (r.m_p == r.m_pBuf) ? _cbStack : ((int*)r.m_p)[-1];
        Allocate(cb);
        memcpy(m_p, r.m_p, cb);
    }
    CBuffer(CBuffer<_cbStack> &&r) { 
        if (r.m_p == r.m_pBuf) {
            m_p = m_pBuf;
            memcpy(m_p, r.m_p, _cbStack);
        } else {
            m_p = r.m_p;
            r.m_p = NULL;
        }
    }
    template <typename T> operator T () const {
        return static_cast<T>(m_p); 
    }
    ~CBuffer() {
        if (m_p && m_p != m_pBuf) 
            free((int*)m_p - 1); 
    }
protected: 
    void Allocate(int cb) {
        if (cb > _cbStack) {
            m_p = (char*)malloc(cb + sizeof(int));
            *(int*)m_p = cb;
            m_p += sizeof(int);
        } else {
            m_p = m_pBuf; 
        }
    }
    char *m_p, m_pBuf[_cbStack];
};

This pragma does not work:

 #pragma optimize("gf", on)

Any ideas?

[Is this helpful to you?](http://stackoverflow.com/questions/13618506/is-it-possible-to-stdmove-objects-out-of-functions-c11/13618587#13618587) — billz, Jan 10 '13 at 23:46
What problem are you trying to solve where this is the solution? — GManNickG, Jan 10 '13 at 23:49
I guess my problem is I want "a class that can never be copied, but can be returned by value by factory functions." I could make a constructors in my CBuffer class that took all the parameters of each factory function, but then my CBuffer would lose its generic-ness and would become dependent upon other header files... I could make separate classes for each way I want to construct a CBuffer, but now I am adding complexity. — johnnycrash, Jan 11 '13 at 00:20
@GManNickG "Actually having working code?" That is kind of harsh. Our definition of working also includes performance metrics. — johnnycrash, Jan 11 '13 at 00:52
@johnnycrash: So have you run those metrics on the code with copy-cosntructors? — GManNickG, Jan 11 '13 at 00:55
@johnnycrash: I am being serious and I'm sticking to the question: if your code relies on RVO to work, it's broken. RVO simply isn't guaranteed and there's no reason to avoid writing a copy-constructor to handle the case when it doesn't happen. This is basic stuff: make your code work, then make it fast. Sure, think about performance and don't implement dumb algorithms, but a copy constructor isn't exactly the most controverisal thing. — GManNickG, Jan 11 '13 at 01:08
@GMan I don't care if you give me a negative vote. Do it 100 times if it makes you feel good. There are lots of people asking the same question and it wouldn't surprise me if the c++ std eventually added a keyword for NRVO. — johnnycrash, Jan 11 '13 at 01:11
@johnnycrash: It would surprise me immensely. If you're that worried, return the value as a reference out parameter. This is canonical. — GManNickG, Jan 11 '13 at 02:04

Yakk - Adam Nevraumont · Answer 1 · 2013-01-11T20:59:53.143

It is not hard to make your code both standards conforming and work.

First, wrap arrays of T with optional extra padding. Now you know the layout.

For ownership use a unique ptr instead of a raw one. If it is vapid, operator T* returns it, otherwise buffer. Now your default move ctor works, as does NRVO if the move fails.

If you want to support non POD types, a bit of work will let you both suppoort ctors and dtors and move of array elements and padding bit for bit.

The result will be a class that does not behave surprisingly and will not create bugs the first time someome tries to copy or move it - well not the first, that would be easy. The code as written will blow up in different ways at different times!

Obey the rule of three.

Here is an explicit example (now that I'm off my phone):

template <size_t T, size_t bufSize=sizeof(T)>
struct CBuffer {
  typedef T value_type;
  CBuffer();

  explicit CBuffer(size_t count=1, size_t extra=0) {
    reset(count, extra);
  }
  void resize(size_t count, size_t extra=0) {
    size_t amount = sizeof(value_type)*count + extra;
    if (amount > bufSize) {
      m_heapBuffer.reset( new char[amount] );
    } else {
      m_heapBuffer.reset();
    }
  }
  explicit operator value_type const* () const { 
    return get();
  }
  explicit operator value_type* () { 
    return get();
  }
  T* get() {
    return reinterpret_cast<value_type*>(getPtr())
  }
  T const* get() const {
    return reinterpret_cast<value_type const*>(getPtr())
  }
private: 
  std::unique_ptr< char[] > m_heapBuffer;
  char m_Buffer[bufSize];
  char const* getPtr() const {
    if (m_heapBuffer)
      return m_heapBuffer.get();
    return &m_Buffer[0];
  }
  char* getPtr() {
    if (m_heapBuffer)
      return m_heapBuffer.get();
    return &m_Buffer[0];
  }
};

The above CBuffer supports move construction and move assignment, but not copy construction or copy assignment. This means you can return a local instance of these from a function. RVO may occur, but if it doesn't the above code is still safe and legal (assuming T is POD).

Before putting it into production myself, I would add some T must be POD asserts to the above, or handle non-POD T.

As an example of use:

#include <iostream>
size_t fill_buff(size_t len, char* buff) {
  char const* src = "This is a string";
  size_t needed = strlen(src)+1;
  if (len < needed)
    return needed;
  strcpy( buff, src );
  return needed;
}
void test1() {
  size_t amt = fill_buff(0,0);
  CBuffer<char, 100> strBuf(amt);
  fill_buff( amt, strBuf.get() );
  std::cout << strBuf.get() << "\n";
}

And, for the (hopefully) NRVO'd case:

template<size_t n>
CBuffer<char, n> test2() {
  CBuffer<char, n> strBuf;
  size_t amt = fill_buff(0,0);
  strBuf.resize(amt);
  fill_buff( amt, strBuf.get() );
  return strBuf;
}

which, if NRVO occurs (as it should), won't need a move -- and if NRVO doesn't occur, the implicit move that occurs is logically equivalent to not doing the move.

The point is that NRVO isn't relied upon to have well defined behavior. However, NRVO is almost certainly going to occur, and when it does occur it does something logically equivalent to doing the move-constructor option.

I didn't have to write such a move-constructor, because unique_ptr is move-constructable, as are arrays inside structs. Also note that copy-construction is blocked, because unique_ptr cannot be copy-constructed: this aligns with your needs.

In debug, it is quite possibly true that you'll end up doing a move-construct. But there shouldn't be any harm in that.

I'm not sure I understand exactly what you mean. I am trying to achieve a stack based buffer most of the time, so how do I get around not having an in place m_pBuf[_cbStack]? If I can't get around that, then when move or copy are invoked, I have to copy the part of the internal buffer that is in use. — johnnycrash, Jan 11 '13 at 01:19
@johnnycrash first, default move copy should work on POD types in an array in a struct. So if you have an array and a `unique_ptr` to possible non array memory, for POD types it will 'just work'. Instead of returning the ptr on cast to `T`, you return a ptr to the `unique_ptr` memory if it exists, and otherwise a ptr to the start of the buffer. Move ctor on `unique_ptr` does the right thing as does move on an array in a struct. Bob is your relative, maybe your mother's brother. — Yakk - Adam Nevraumont, Jan 11 '13 at 02:14
Thanks for the time! The pattern of "size the buffer, create the CBuffer, then use the buffer" works, and is what I am doing now. What I want to do is see if I can create a factory functions that combine the steps and just spit out a CBuffer. So you can just say something like "auto foo = obj.Factory<256>();" Factory would spit out a populated foo that would destruct when it went out of scope. — johnnycrash, Jan 11 '13 at 20:48
Hey, I can't get it to compile. I was going to try it out and compare it to mine. — johnnycrash, Jan 12 '13 at 06:32
@johnnycrash i would not be surprised by some syntax errors -- could you throw your attempt up at live work space so I can look at what is going wrong? On my phone, makes doing it myself tricky, until I finish some home wiring. — Yakk - Adam Nevraumont, Jan 12 '13 at 15:58

score 1 · Answer 2 · answered Jan 10 '13 at 23:59

I don't think there is a publicly available fine-grained compiler option that only triggers NRVO.

However, you can still manipulate compiler optimization flags per each source file via either changing options in Project settings, command line, and #pragma.

http://msdn.microsoft.com/en-us/library/chh3fb0k(v=vs.110).aspx

Try to give /O1 or /O2 to the file that you want.

And, the debug mode in Visual C++ is nothing but a configuration with no optimizations and generating debugging information (PDB, program database file).

BTW I tried all those #pragmas before posting this, and I can't find any that work. — johnnycrash, Jan 14 '13 at 02:47

score 1 · Answer 3 · answered Jan 11 '13 at 00:09

1

If you are using Visual C++ 2010 or later, you can use move semantics to achieve an equivalent result. See How to: Write a Move Constructor.

answered Jan 11 '13 at 00:09

Neil

54,642
8
60
72

In my case, move isn't so great, since I have a large in place buffer in the class. – johnnycrash Jan 11 '13 at 00:35
1

+1 for suggesting move semantics. Important stuff. However move semantics are different from what NRVO and RVO do, and in my case, not nearly as good. – johnnycrash Jan 11 '13 at 01:24
@johnnycrash In that case, if the use case is to optimise away redundant copies, you need to compile with optimisation enabled. – Neil Jan 11 '13 at 11:15

How to force return value optimization in msvc

3 Answers3