Function that dynamically construct a byte array and return length

Question

I need to create an encoder function in a class

bool encodeMsg(unsigned char* buffer, unsigned short& len);

This class has some fixed length members and some variable length vectors (of different structures). I have to encode a Byte stream based on some sequence of these member variables.

Here is a salable version,

class test
{
  public:
    test();
    ~test();

    bool encodeMsg(unsigned char* buffer);
    bool decodeMsg(const unsigned char* buffer, unsigned short len);

  private:
    unsigned char a; // 0x12
    unsigned char b; // 0x34
    unsigned char c; // 0x56
}

what I want is 0x123456 in my buffer when I encode.

Questions,

How should I allocate memory? As It is not known before calling this function
Is there a way to map class object memory which basically gives what I want.

I know this is very basic question, but want to know optimal and conventional method to do it.

this might help: http://stackoverflow.com/questions/234724/is-it-possible-to-serialize-and-deserialize-a-class-in-c — samgak, Mar 31 '15 at 06:28

Tony Delroy · Answer 1 · 2015-03-31T07:45:03.927

1

How should I allocate memory? As It is not known before calling this function

Given you current code, the caller should allocate the memory:

unsigned char buffer[3];
unsigned short len = sizeof buffer;
my_test_object.encodeMsg(buffer, len);

Is there a way to map class object memory which basically gives what I want.

That's very vague. If you use a (possibly compiler-specific) #pragma or attribute to ensure the character values occupy 3 contiguous bytes in memory, and as long as you don't add any virtual functions to the class, you can implement encodeMsg() using:

memcpy(buffer, (unsigned char*)this + offsetof(test, a), 3);

But, what's the point? At best, I can't imagine that memcpy ever being faster than the "nice" way to write it:

buffer[0] = a;
buffer[1] = b;
buffer[2] = c;

If you actually mean something more akin to:

test* p = reinterpret_cast<test*>(buffer);
*p = *this;

That will have undefined behaviour, and may write up to sizeof(test) bytes into the buffer, which is quite likely to be 4 rather than 3, and that could cause some client code buffer overruns, remove an already-set NUL terminator etc.. Hackish and dangerous.

Taking a step back, if you have to ask these sorts of questions you should be worrying about adopting good programming practice - only once you're a master of this kind of thing should you be worrying about what's optimal. For developing good habits, you might want to look at the boost serialisation library and get comfortable with it first.

edited Mar 31 '15 at 07:45

answered Mar 31 '15 at 06:44

Tony Delroy

102,968
15
177
252

1

I doubt `my_test_object.encodeMsg(buffer, sizeof buffer);` works, since `len` is not a const reference and temporary variables only bind to const references. – rozina Mar 31 '15 at 06:46
@rozina: well spotted - it's crazy to accept `len` by reference, but that's the API and I'd missed that... thanks. – Tony Delroy Mar 31 '15 at 07:45
@TonyD I think the interface is trying to tell you that `encodeMsg()` should allocate the memory needed for `buffer` and returns its length via `len` variable. – rozina Mar 31 '15 at 07:51
@rozina: there's a pointer passed in and no way to pass one out so that can't work as is, but then the reference implies whoever wrote the function prototype didn't have a clear understanding of how it was to work, so who knows.... – Tony Delroy Mar 31 '15 at 07:57
@TonyD Thank you for your answer and pointer to boost serialization. I asked for memory because I also have some variable length vector members whose length is unknown while I encode. Although, I can always allocate buffer[MAX_BUFFER_LEN] – creativeDrive Mar 31 '15 at 18:29
@creativeDrive I see - it would be natural to allocate the memory inside `encodeMsg()` then - might as well use a `vector` and return it by value: the `vector` tracks `size` anyway so no need for a separate `len` result. rozina's answer shows how - prefer the second version therein (an empty `vector`'s a plausible "sentinel" value indicating failure, if you need one). – Tony Delroy Apr 01 '15 at 04:03

score 1 · Answer 2 · edited Apr 01 '15 at 08:08

The C++ way would be to use streams. Just implement the insertion operator << for encoding like this

std::ostream& operator<<(std::ostream& os, const test& t)
{
  os << t.a;
  os << t.b;
  os << t.c;

  return os;
}

Same with extraction operator >> for decoding

std::istream& operator>>(std::istream& is, test& t)
{
  is >> t.a;
  is >> t.b;
  is >> t.c;

  return is;
}

This moves memory management to the stream and caller. If you need a special encoding for the types then derive your codec from istream and ostream and use those.

The memory and the size can be retrieved from the stream when using a stringstream like this

test t;
std::ostringstream strm;
strm << t;

std::string result = strm.str();
auto size = result.length(); // size
auto array = result.data(); // the byte array

Please verify `is` and `os` symbols use. – CiaPan Mar 31 '15 at 09:57 — CiaPan, Mar 31 '15 at 09:57

score 1 · Answer 3 · answered Mar 31 '15 at 10:02

If you can change the interface of your encodeMsg() function you could store the byte stream in a vector.

bool test::encodeMsg(std::vector<unsigned char>& buffer)
{
    // if speed is important you can fill the buffer some other way
    buffer.push_back(a);
    buffer.push_back(b);
    buffer.push_back(c);

    return true;
}

If encodeMsg() can't fail (does not need to return bool) you can create and return the vector in it like this:

std::vector<unsigned char> test::encodeMsg()
{
    std::vector<unisgned char> buffer;

    // if speed is important you can fill the buffer some other way        
    buffer.push_back(a);
    buffer.push_back(b);
    buffer.push_back(c);

    return buffer;
}

MikeMB · Answer 4 · 2015-03-31T09:55:17.183

For classes that are trivially copyable std::is_trivially_copyable<test>::value == true, encoding and decoding is actually straight forward (assuming you have already allocated the memory for buffer:

bool encodeMsg(unsigned char* buffer, unsigned short& len) {
    auto* ptr=reinterprete_cast<unsigned char*>(this);
    len=sizeof(test);
    memcpy(buffer, ptr, len);
    return true;
}
bool decodeMsg(const unsigned char& buffer){
    auto* ptr=reinterprete_cast<unsigned char*>(this);
    memcpy(ptr, buffer, sizeof(test));
    return true;
}

or shorter

 bool encodeMsg(unsigned char* buffer, unsigned short& len) {       
    len=sizeof(test);
    memcpy(buffer, (unsigned char*)this, len);
    return true;
}
bool decodeMsg(const unsigned char& buffer){        
    memcpy((unsigned char*)this, buffer, sizeof(test));
    return true;
}

Most probably, you will copy 4 bytes instead of 3 though due to stuffing.

As far as interpreting something directly as a byte array goes - casting a pointer from test* to unsigned char* and accessing the object through it is legal,but not the other way round. So what you could write is:

unsigned char* buffer encodeMsg( unsigned short& len) {
    len=sizeof(test);
    return reinterprete_cast<unsigned char*>(this);
}
bool decodeMsg(const unsigned char& buffer){
    auto* ptr=reinterprete_cast<unsigned char*>(this);
    memcpy(ptr, buffer, sizeof(test));
    return true;
}

Function that dynamically construct a byte array and return length

4 Answers4