2

I have a pointer to raw data using const u_char*, and a generic class like this

class Rectangle 
{
   u_int8_t length;
   u_int8_t height;
   ...
}

Assuming the raw data is a binary "stream" of bytes, what is the best way to get the raw data into the fields of the class.

-memcpy ?
-cast   ?

i could do this :

Rectangle *rect = (Rectangle*)rawdata;

but i know that is a "old style" cast.

What is the proper way ?

user7957
  • 91
  • 5
  • See http://stackoverflow.com/questions/332030/when-should-static-cast-dynamic-cast-and-reinterpret-cast-be-used –  Oct 08 '12 at 19:37
  • Note that you can only deserialize POD classes this way; non-POD classes could have vtables or other nontrivial elements which would make it impossible to directly deserialize using a cast. – nneonneo Oct 08 '12 at 19:39

5 Answers5

2

I'd say having a constructor:

Rectangle(const u_char*)

having casts all over the code could work fine for now, but it's a terrible idea in case you want to change your class later on. Having a constructor can mean some overhead, but you'd have a single point where the logic happens.

If later you decide you want to add a virtual method to Rectangle, all casts in the code will become useless.

This is of course if you want to construct a new object from data. If you frequently serialize/deserialize objects, I'd go with serializations methods:

const u_char* toUChar() const;
void fromUChar(const u_char*) const;
Luchian Grigore
  • 253,575
  • 64
  • 457
  • 625
  • 1
    +1 This is the way to go. Loading from `u_char *` many have Endian issues for multibyte values, such as 16-bit integers. See also *Factory Design Pattern*. – Thomas Matthews Oct 08 '12 at 19:46
  • thank you. Even in the constructor as you suggest, I would still need to cast or copy or something to get the first byte of 'rawdata' into 'length' and the second byte into 'width' ... right ? – user7957 Oct 08 '12 at 19:58
  • @user7957 use memcpy or similar for individual members. – bames53 Oct 08 '12 at 20:01
  • +1 explicit deserialization is preferred. Only exception could be if you need to read millions of instances of the same type, and perf becomes an issue. –  Oct 08 '12 at 20:03
2
Rectangle& rect = *reinterpret_cast<Rectangle*>(rawdata);

I'll do it this way if rawdata is void or cast directly to a reference if rawdata is any other type (one that I can deference directly).

I prefer the reference because I find it less error prone than using raw pointers. However, if you need to do pointer arithmetic there's no problem making it a raw pointer. Depending on usage you might want to cast to a const Rectangle& instead of non-const.

However, usually with raw byte streams you need to invent a protocol and you shouldn't cast directly into a struct or class. Structs and classes may have padding which messes up your direct cast. The cast will silently succeed no matter what, but your values will be unexpected. A protocol would be something like...

0 offset: (4 bytes - float) size

4 offset: (2 bytes - uint16_t) height

etc. Doing the protocol approach would mean you'll have to assign the members one by one.

Community
  • 1
  • 1
David
  • 27,652
  • 18
  • 89
  • 138
  • You can (or should) then assert the `offsetof` every member. This will produce meaningful errors when the protocol and struct layout don't match (anymore). – leemes Oct 08 '12 at 19:47
  • Casts are likely to result in UB from aliasing violations or bad pointer alignment as well. – bames53 Oct 08 '12 at 20:00
  • if i did invent a protocol for this, how do i go about pulling the individual bytes out of *rawdata , and plugging them into the Rectangles fields ? IIUC, i can use offset of to verify the value, after i do that... right? – user7957 Oct 08 '12 at 20:18
1

The easiest and most reliable method is to simply use a convert constructor:

class Rectangle
{
public:
  Rectangle(uint8_t l, uint8_t h) : length(l), height(h) {};
  // ...
};

This should be your go-to method until for whatever reason this is impossible.

Barring this, the next best thing to do is simply do a memberwise initialization:

Rectangle rect;
rect.width = 20;
rect.height = 40;

If it becomes impossible to do the above, and iff the object in question is what the Standard refers to as an "aggregate" (basically a POD), you can use an initializer like this:

Recatagle rect = {10,20};

When doing this, you must bear in mind that the members will be initialized in the order in which they are declared in the class. If you change the order of declaration, you will break every initialization like the above. This is very brittle. For this reason, I limit my use of a construct like this to cases where the class in question is highly localized (like a helper class in a single translation unit), and I document the need to keep the order of declaration intact.

EDIT PER COMMENTS:

In the instance you are trying to copy strings in to your class, or pointers to any sort of data, you will need to do a deep copy:

class Gizmo
{
public:
  Gizmo(const char* str) : str_(0)
  {
    str_ = new char[strlen(str)+1];
    strcpy(str_,str);
  }
};

Note the clumsiness and how brittle the above code is. There are plenty of things that could go wrong here. Not the least of which are forgetting to delete str_ when Gizmo is destroyed, the ugliness and seeming lack of necessity for newing a char string in the first place, one-past-the-end errors ... the list goes on. For these reasons, it's best to avoid using raw pointers at all and using either smart pointers (ie unique_ptr, shared_ptr, etc) or collection classes. In this case, I'd use a std::string, which can be thought of as a collection class:

class Gizmo
{
public:
  Gizmo(const char* str) : str_(str) {};
private:
  std::string str_;
};

Feel free to convert this for use with a u_char*, and to add robustness by means of verifying the source pointer is valid.

John Dibling
  • 99,718
  • 31
  • 186
  • 324
  • This seems very clean to me. One question though, how do i get my const u_char *rawdata "into" that sort of constructor ? – user7957 Oct 08 '12 at 20:07
  • ok , its starting to make sense now. Many of the answers suggest doing the copy within a constructor. But the actual "work" to copy the individual bytes from rawdata would still be memcpy or strcpy. Is that right ? I have read about memcpy but thought it to be more of a 'c' style then a 'c++' style ... is that wrong ? – user7957 Oct 08 '12 at 20:27
  • `memcpy` is very C-ish in style. That is not to say it's not legitimate C++ code -- it is. But features and library components available in C++ make it possible to employ idioms that either are impossible or difficult in C. Many of those idioms revolve around the idea of avoiding copying individual bytes or taking direct control of memory, because experience has shown than many bugs are caused by mishandling those things especially in complex systems. So whenever you see a memcpy or a new, ask yourself if there's a better way. – John Dibling Oct 08 '12 at 20:36
  • So I think what I would need to do (inside a constructor , or a factory method ) is : memcpy(length , rawdata , sizeof( length ) ); rawdata += sizeof( length ); memcpy(width,rawdata, sizeof(width)); rawdata += sizeof( width ); ... and so on for all fields in rawdata ? – user7957 Oct 08 '12 at 20:48
  • You don't need to deep-copy things that don't need to be deep-copied, like `uint32_t`. You only need `memcpy`-like code for things that need to be deep-copied. – John Dibling Oct 08 '12 at 20:50
0

The "new style" cast is

Rectangle *rect = reinterpret_cast< Rectangle* >( rawdata );

But you should be careful with the alignment, the padding in your class.

This will use the same memory, it will just interpret it like a Rectanle object.

memcpy will copy it. Depends on your needs. Most probably, you don't need it.

Kiril Kirov
  • 37,467
  • 22
  • 115
  • 187
0

While directly casting is a little faster, doing it that way locks your class into a specific design. If you later make changes to the data within the class, all casts will break.

A better way to handle raw data would be to either make a constructor and/or overload the input stream operator. I personally prefer overloading the operator to a constructor because I hate having more than one simple constructor to a class. The overload might look something like this:

istream& operator >> (istream& input, char* rawData)
{
    //fill your object's variables directly from the raw data
}

This way you take control of filling your object from the data, and if the data or your object ever changes, you only have to change code in one location to fix it everywhere.

I also like using the operator overload over creating a function for it because I think it makes the code look nicer and faster to see what you're doing.

Cdaragorn
  • 344
  • 3
  • 10