12

At a previous employer, we were writing binary messages that had to go "over the wire" to other computers. Each message had a standard header something like:

class Header
{
    int type;
    int payloadLength;
};

All of the data was contiguous (header, immediately followed by data). We wanted to get to the payload given that we had a pointer to a header. Traditionally, you might say something like:

char* Header::GetPayload()
{
    return ((char*) &payloadLength) + sizeof(payloadLength);
}

or even:

char* Header::GetPayload()
{
    return ((char*) this) + sizeof(Header);
}

That seemed kind of verbose, so I came up with:

char* Header::GetPayload()
{
    return (char*) &this[1];
}

It seems rather disturbing at first, possibly too odd to use -- but very compact. There was a lot of debate on whether it was brilliant or an abomination.

So which is it - Crime against coding, or nice solution? Have you ever had a similar trade-off?

-Update:

We did try the zero sized array, but at the time, compilers gave warnings. We eventually went to the inhertited technique: Message derives from Header. It works great in practice, but in priciple you are saying a message IsA Header - which seems a little awkward.

Dan Hewett
  • 2,200
  • 1
  • 14
  • 18

17 Answers17

42

I'd go for crime against coding.

Both methods will generate the exact same object code. The first makes it's intention clear. The second is very confusing, with the only advantage that it saves a couple keystrokes. (Just learn to freakin' type).

Also, note that NEITHER method is guaranteed to work. The sizeof() an object includes padding for word alignment, so that if the header was:

class Header
{
    int type;
    int payloadLength;
    char  status;
};

Both methods you describe would have the payload starting at Header+12, when most likely it actually starts at Header+9.

Ferruccio
  • 98,941
  • 38
  • 226
  • 299
James Curran
  • 101,701
  • 37
  • 181
  • 258
  • For tips on how to type faster, see http://stackoverflow.com/questions/94101/how-to-type-faster – James Schek Oct 21 '08 at 19:58
  • What is "very confusion"? BTW, "&this[1]" does have advantages over the other two - it doesn't require maintenance when the class changes. – Menkboy Oct 21 '08 at 20:03
  • 2
    Oooh wait! You can make it even more unreadable with: return (char *)&1[this] – Ates Goral Oct 21 '08 at 20:07
  • @Menkboy: It does if the header types change and it's going to be a bugger to track that down if no one documents what "&this[1]" is meant to be doing. It's just unnecessarily terse. – Jeff Yates Oct 21 '08 at 20:22
  • @ffpf: I think we might disagree about what "&this[1]" means. What do you understand by it? – Menkboy Oct 21 '08 at 20:32
  • It is referencing one "Header" beyond Header and therefore pointing at the beginning of the payload. I think my previous comment is poorly worded. I can see situations where some may change Header and think they have to change this too, when they don't. Yes, they should know better, but some don't. – Jeff Yates Oct 21 '08 at 20:42
  • What I'm trying to say is, if you have to explain it to one person, there's more that will need it explaining and whether you believe they should know or not, you have to code for what will happen not what you'd like to happen. That's the nature of professional software development. – Jeff Yates Oct 21 '08 at 20:44
  • @ffpf: I can see that. But OTOH, the people you explain it to will have learnt something, and probably won't have to ask again. – Menkboy Oct 21 '08 at 20:49
  • Tiny code is no replacement for readable code. Sure, it might be fun to try to make something as small as possible, but it harms the ability to read and understand the code. – TM. Oct 21 '08 at 21:49
  • 2
    Should never use compiler structures over the wire. Endianness issues, compiler padding issues between projects, CPU word sizes etc. Learn how to use a real solution. Google StreamBuffers is (for example) much clearer, and language independant. – Tim Williscroft Oct 21 '08 at 23:57
15

Personally I think that if there's a crime, it's asking the header for the payload.

But as long as you're going to do it that way, 'this+1' is as good a way as any.

Justification: '&this[1]' is a general-purpose piece of code that doesn't require you to go digging through class-definitions to fully comprehend, and doesn't require fixing when someone changes the name or contents of the class.

BTW, the first example is the true crime against humanity. Add a member to the end of the class and it'll fail. Move the members around the class and it'll fail. If the compiler pads the class, it'll fail.

Also, if you're going to assume that the compiler's layout of classes/structs matches your packet layout, then you should understand how the compiler in question works. Eg. on MSVC you'll probably want to know about #pragma pack.

PS: It's a little scary how many people consider "this+1" or "&this[1]" hard to read or understand.

Menkboy
  • 1,595
  • 10
  • 12
  • Technically, dereferencing `&this[1]` or `this+1` formally is UB if object is not an element of array. Or is a sub-object of obeject with non-standard memory layout. – Swift - Friday Pie Sep 02 '23 at 06:32
14

You're depending on the compiler to layout your classes in a particular way. I would have defined the message as a struct (with me defining layout) and had a class that encapsulates the message and provides the interface to it. Clear code = good code. "Cute" code = bad (hard to maintain) code.

struct Header
{
    int type;
    int payloadlength;
}
struct MessageBuffer
{
   struct Header header;
   char[MAXSIZE] payload;
}

class Message
{
  private:
   MessageBuffer m;

  public:
   Message( MessageBuffer buf ) { m = buf; }

   struct Header GetHeader( )
   {
      return m.header;
   }

   char* GetPayLoad( )
   {
      return &m.payload;
   }
}

It's been awhile since I've written any C++, so please excuse any issues with syntax. Just trying to convey the general idea.

tvanfosson
  • 524,688
  • 99
  • 697
  • 795
  • I implemented a peer to peer filesharing system that used essentially exactly this. It worked really well. I had a factory create different subclasses of Message based on the type so that I didn't have to deal with raw payload though. Not too proud of the RTTI required, but it was isolated. – rmeador Oct 21 '08 at 20:28
  • This is also the pattern that came to mind for me. I'm surprised at the number of people who consider the techniques used in the question to be 'elegant'. Maybe clever hacks, but the kind of thing that I thought best practices have been trying to eliminate over the past 15 years (or more). – Michael Burr Oct 21 '08 at 20:38
13

It's a common problem, but what you actually want is this.

class Header
{
    int type;
    int payloadLength;
    char payload[0];

};

char* Header::GetPayload()
{
    return payload;
}
Roddy
  • 66,617
  • 42
  • 165
  • 277
5

My vote is Coding Horror. Don't get me wrong, it's clever - but you're saving yourself one entire addition operation at the cost of making the code much more difficult to understand and read. I don't see the tradeoff as worth it.

Tom Ritter
  • 99,986
  • 30
  • 138
  • 174
  • It's not even saving an addition - the compiler must still generate it to get the address of the second element. It's just being hidden by the syntax. – Mark Ransom Oct 21 '08 at 20:32
5

I think this is flawed from the start on if the header needs to "return" data which is not included in it.

As you already put yourself on these hackish grounds, I really love what you came up with.

But note that this is not a beauty contest. You should find an entire different solution. For alle the three versions of GetPayload() you presented, I would not understand what the hell is going on there without your further explanation.

ypnos
  • 50,202
  • 14
  • 95
  • 141
4

Have you considered the 'empty array member' trick? I remember seeing it often, and even using it once or twice, but I can't seem to find any really good references (except, maybe, the one referenced below).

The trick is to declare your struct as

struct bla {
    int i;
    int j;
    char data[0];
}

Then, the 'data' member simply points to whatever is behind the headers. I am not sure how portable this is; I've seen it with '1' as the array size as well.

(using the URL below as a referece, using the '[1]' syntax, seemed not to work because it is too long. Here is the link:)

http://developer.apple.com/documentation/DeveloperTools/gcc-4.0.1/gcc/Zero-Length.html

Jan de Vos
  • 3,778
  • 1
  • 20
  • 16
3

If it works -- consistently -- then it's an elegant solution.

It will normally work in memory because the compiler will deal with alignment issues and you can assume that the Payload follows the header in correctly-aligned memory space.

I could see this falling apart when the Header/Payload objects are streamed "over the wire" because the streaming mechanism you use will probably not care about aligning objects on any particular boundary. Therefore, Payload may directly follow the Header with no padding to force it to a particular alignment.

To coin a phrase, elegant is as elegant does. So, it's elegant as long as you are careful how you stream it.

Kluge
  • 3,567
  • 3
  • 24
  • 21
2

First, there's an awful lot of space between "crime against coding" and "nice solution," but I'd say this is closer to the former.

Is the Header its Payload's keeper?

That's the fundamental problem here -- both the header and the payload should be managed by another object that holds the entire message, and that is the proper place to ask for the payload. And it would be able to do so without pointer arithmetic or indexing.

Given that, I would favor the second solution, since it is clearer what is going on.

But that we're in this situation to begin with seems to indicate that the culture of your team values cleverness over clarity, so I guess all bets are off.

If you really want to be cute, you could generalize.

template<typename T. typename RetType>
RetType JustPast(const T* pHeader)
{
   return reinterpret_cast<RetType>(pHeader + sizeof(T));
}
JohnMcG
  • 8,709
  • 6
  • 42
  • 49
1

They're basically the same thing as far as I'm concerned. Both are forms of byte juggling, which is always risky, but not impossible to get right. The first form is a bit more accepted and recognizable. I would personally write :

char* Header::GetPayload()
{
    return ((char*) this) + sizeof(*this);
}
QBziZ
  • 3,170
  • 23
  • 24
1

Don't forget that VC++ may impose a padding on the sizeof() value on the class. Since the provided example is expected to be 8 bytes, it is automatically DWORD aligned, so should be ok. Check #pragma pack.

Though, I agree, the provided examples are some degree of Coding Horror. Many Win32 data structures include a pointer placeholder in the header structure when variable length data follows. This is probably the easiest way to reference this data once it's loaded into memory. The MAPI SRowSet structure is one example of this approach.

spoulson
  • 21,335
  • 15
  • 77
  • 102
1

I actually do something similar, and so does nearly every MMO or online video game ever written. Although they have a concept called a "Packet" and each packet has it's own layout. So you might have:

struct header
{
    short id;
    short size;
}

struct foo
{
    header hd;
    short hit_points;
}


short get_foo_data(char *packet)
{
    return reinterpret_cast<foo*>(packet)->hit_points;
}

void handle_packet(char *packet)
{
    header *hd = reinterpret_cast<header*>(packet);
    switch(hd->id)
    {
        case FOO_PACKET_ID:
            short val = get_foo_data(packet);
        //snip
    }
}

And they do that for the majority of their packets. Some packets obviously have dynamic sizes, and for those members they use length prefixed fields and some logic to parse that data.

Raindog
  • 1,468
  • 3
  • 14
  • 28
1

I think in this day and age, in C++, the C-style cast to char* disqualifies you from any "brilliant design idea" awards without getting much of a hearing.

I might go for:

#include <stdint.h>
#include <arpa/inet.h>

class Header {
private:
    uint32_t type;
    uint32_t payloadlength;
public:
    uint32_t getType() { return ntohl(type); }
    uint32_t getPayloadLength() { return ntohl(payloadlength); }
};

class Message {
private:
    Header head;
    char payload[1]; /* or maybe std::vector<char>: see below */
public:
    uint32_t getType() { return head.getType(); }
    uint32_t getPayloadLength() { return head.getPayloadLength(); }
    const char *getPayload() { return payload; }
};

This assumes C99-ish POSIX, of course: to port to non-POSIX platforms you'd have to define one or both of uint32_t and ntohl yourself, in terms of whatever the platform does offer. It's usually not hard.

In theory you might need layout pragmas in both classes. In practice I'd be surprised given the actual fields in this case. The issue can be avoided by reading/writing the data from/to iostreams one field at a time, rather than trying to construct the bytes of the message in memory and then write it in one go. It also means you can represent the payload with something more helpful than a char[], which in turn means you won't need to have a maximum message size, or mess about with malloc and/or placement new, or whatever. Of course it introduces a bit of overhead.

Steve Jessop
  • 273,490
  • 39
  • 460
  • 699
0

Perhaps you should have used a verbose method, but replaced it with a #define macro? This way you can use your shorthand when typing, but anyone needing to debug the code can follow along without issue.

Anthony Giorgio
  • 1,844
  • 1
  • 15
  • 17
0

I vote for &this[1]. I've seen it used quite a bit when parsing files that have been loaded into memory (which can equally include received packets). It may look a tad odd the first time you see it, but I think that what it means should be immediately obvious: it's clearly the address of memory just past this object. It's nice because it's hard to get wrong.

0

I don't like to use words like "crime". I would rather point out that &this[1] seems to make assumptions about memory layout that a compiler might disagree with. For example, any compiler might, for its own reasons (like alignment), insert dummy bytes anywhere in a structure. I would prefer a technique that has more of a guarantee of getting the correct offset if compilers or options get changed.

Mike Dunlavey
  • 40,059
  • 14
  • 91
  • 135
0

In addition to the abovementioned, I'd say that this is a crime against interoperability and good wire protocol design principles. It is really surprising how many programmers are not able/willing to make a clear distinction between a wire protocol definition and its implementation. If your protocol has to survive for more than two days, it most probably has to survive for more than two years/OSes/compilers/languages/endiannesses and in some point it will break, rather sooner than later. So, make other folks' life easier, write down the wire protocol specification plus write proper (de)serialization routines. Otherwise, people will keep mentioning your name in not so pleasant contexts.