I have a function in a class that I want the compiler to use NRVO on...all the time...even in debug mode. Is there a pragma for this?
Here is my class that works great in "release" mode:
template <int _cbStack> class CBuffer {
public:
CBuffer(int cb) : m_p(0) {
m_p = (cb > _cbStack) ? (char*)malloc(cb) : m_pBuf;
}
template <typename T> operator T () const {
return static_cast<T>(m_p);
}
~CBuffer() {
if (m_p && m_p != m_pBuf)
free(m_p);
}
private:
char *m_p, m_pBuf[_cbStack];
};
The class is used to make a buffer on the stack unless more than _cbStack bytes are required. Then when it destructs, it frees memory if it allocated any. It's handy when interfacing to c functions that require a string buffer, and you are not sure of the maximum size.
Anyway, I was trying to write a function that could return CBuffer, like in this test:
#include "stdafx.h"
#include <malloc.h>
#include <string.h>
template <int _cbStack> CBuffer<_cbStack> foo()
{
// return a Buf populated with something...
unsigned long cch = 500;
CBuffer<_cbStack> Buf(cch + 1);
memset(Buf, 'a', cch);
((char*)Buf)[cch] = 0;
return Buf;
}
int _tmain(int argc, _TCHAR* argv[])
{
auto Buf = foo<256>();
return 0;
}
I was counting on NRVO to make foo() fast. In release mode, it works great. In debug mode, it obviously fails, because there is no copy constructor in my class. I don't want a copy constructor, since CBuffer will be used by developers who like to copy everything 50 times. (Rant: these guys were using a dynamic array class to create a buffer of 20 chars to pass to WideCharToMultiByte(), because they seem to have forgotten that you can just allocate an array of chars on the stack. I don't know if they even know what the stack is...)
I don't really want to code up the copy constructor just so the code works in debug mode! It gets huge and complicated:
template <int _cbStack>
class CBuffer {
public:
CBuffer(int cb) : m_p(0) { Allocate(cb); }
CBuffer(CBuffer<_cbStack> &r) {
int cb = (r.m_p == r.m_pBuf) ? _cbStack : ((int*)r.m_p)[-1];
Allocate(cb);
memcpy(m_p, r.m_p, cb);
}
CBuffer(CBuffer<_cbStack> &&r) {
if (r.m_p == r.m_pBuf) {
m_p = m_pBuf;
memcpy(m_p, r.m_p, _cbStack);
} else {
m_p = r.m_p;
r.m_p = NULL;
}
}
template <typename T> operator T () const {
return static_cast<T>(m_p);
}
~CBuffer() {
if (m_p && m_p != m_pBuf)
free((int*)m_p - 1);
}
protected:
void Allocate(int cb) {
if (cb > _cbStack) {
m_p = (char*)malloc(cb + sizeof(int));
*(int*)m_p = cb;
m_p += sizeof(int);
} else {
m_p = m_pBuf;
}
}
char *m_p, m_pBuf[_cbStack];
};
This pragma does not work:
#pragma optimize("gf", on)
Any ideas?