12

I write quite a lot of code which processes message protocols. Quite often a message protocol will have a generic message frame which can be deserialised from a serial port or socket; the frame contains a message type, and the message payload must be processed based on the message type.

Normally I write a polymorphic set of classes with accessor methods and a constructor which takes a reference to the message frame.

It occurs to me though that instead of constructing an accessor class based on a reference to the message frame, I could just derive the accessor classes directly from the message frame, and then reinterpret_cast from the message frame to the appropriate accessor class. This makes the code more concise and saves some bytes and processor cycles.

See the (extremely contrived and condensed) example below. Obviously for production code this would all need to be properly encapsulated, the cast made a member of the derived class, better separation of concerns imposed, and some validation added. This has all been removed for the sake of putting together a concise example.

#include <iostream>
#include <cstring>
#include <vector>

struct GenericMessage
{
  GenericMessage(const char* body):body_(body, body+strlen(body)){}
  std::vector<char> body_;  
};

struct MessageType1:public GenericMessage
{
    int GetFoo()const
    {
        return body_[2];
    }
    int GetBar()const
    {
        return body_[3];
    }    
};

int main() 
{
    GenericMessage myGenericMessage("1234");
    MessageType1* myMgessageType1 = reinterpret_cast<MessageType1*>(&myGenericMessage);
    std::cout << "Foo:" << myMgessageType1->GetFoo() << std::endl;
    std::cout << "Bar:" << myMgessageType1->GetBar() << std::endl;
    return 0;
}

I've never see this done anywhere. Is there any downside to casting from base to derived in this way, given that derived has no additional data members?

Simon Elliott
  • 2,087
  • 4
  • 30
  • 39
  • 2
    Am I missing a point, or why don't just use `static_cast` or `dynamic_cast` ? – user1781290 Dec 11 '13 at 11:53
  • 1
    Because it is a downcast, which needs an implicit cast – user1781290 Dec 11 '13 at 12:01
  • @JBL: error: invalid conversion from ‘GenericMessage*’ to ‘MessageType1*’ ... unsurprisingly, because it's converting from base to derived pointer. – Simon Elliott Dec 11 '13 at 12:04
  • It's not even downcast because `myGenericMessage` is of type `GenericMessage` not `MessageType1`. – Konstantin Oznobihin Dec 11 '13 at 12:05
  • 1
    @user1781290: good point about static_cast. dynamic_cast won't work because the typeid doesn't match. – Simon Elliott Dec 11 '13 at 12:05
  • Oh, ok. It did get this wrong the first time, sorry. Anyways have a look at the following link, it is related to your problem: http://stackoverflow.com/questions/20263888/are-there-cases-where-downcasting-an-actual-base-to-a-derived-would-be-defined/20264963#20264963 – user1781290 Dec 11 '13 at 12:08
  • 1
    @SimonElliott there might be UB if `std::vector` is not standard-layout class which is implementation defined. Aside from this, as you can see from comments the code is hard to understand and it looks to be fragile, as soon as any of your message type become non-standard layout you get UB. Why not make a factory to create specific message types instead of `GenericMessage`? – Konstantin Oznobihin Dec 11 '13 at 12:10
  • I'm not that familiar with c++11, but since you are using vector<> I'd guess this is a perfect use case for move construction? Once you've categorized the message type will there ever be a reason to return to using it as the generic message type? – Speed8ump Dec 11 '13 at 12:55
  • Why don't you use a `union`?? – The Vee Aug 29 '16 at 10:31
  • Is this UB in C++20 if you use `bit_cast` instead of `static_cast`? – nyanpasu64 Jun 09 '20 at 04:51

3 Answers3

6

Here is why I would not use this technique:

  1. It is a violation of the Standard and causes the behavior to be undefined. It is probably true that this works nearly all the time, but you can't rule out problems in the future. Compilers have been seen to make use of undefined behavior in optimizations, much to the disadvantage of the unsuspecting programmer. And you can't predict when and under what circumstances this will happen.

  2. You can't guarantee that neither you nor a team mate will ever add some data members to the derived type. Your class hierarchy will grow and more code will be added over time; at some point it may not be obvious to you or another programmer that adding an innocent data member to the derived type (even temporarily, perhaps for some debugging purpose) can spell disaster.

  3. There are clean and legal alternatives, for example using wrappers based on references:

    #include <iostream>
    
    struct Elem
    { };
    
    struct ElemWrapper
    {
      Elem &elem_;
    
      ElemWrapper(Elem &elem) : elem_(elem)
      { }
    };
    
    struct ElemWrapper1 : ElemWrapper
    {
      using ElemWrapper::ElemWrapper;
    
      void foo()
      { std::cout << "foo1" << std::endl; }
    };
    
    struct ElemWrapper2 : ElemWrapper
    {
      using ElemWrapper::ElemWrapper;
    
      void foo()
      { std::cout << "foo2" << std::endl; }
    };
    
    int main()
    {
      Elem e;
    
      ElemWrapper1(e).foo();
    
      return 0;
    }
    
Community
  • 1
  • 1
jogojapan
  • 68,383
  • 11
  • 101
  • 131
  • Looks too dangerous for me to store references passed in constructor. If it was temporary object it will be destructed just after construction. – Konstantin Oznobihin Dec 11 '13 at 12:18
  • @KonstantinOznobihin I don't understand your point. Of course the temporary object will be destroyed, but why is that a problem? – jogojapan Dec 11 '13 at 12:19
  • I supposed `elem_` member is there to be used by `ElemWrapper` member functions which is not a good idea if it references already destructed object. – Konstantin Oznobihin Dec 11 '13 at 12:23
  • 1
    @KonstantinOznobihin The wrapper has a non (!) const reference, hen not copy is possible. –  Dec 11 '13 at 12:28
  • @KonstantinOznobihin Ah, I see. You mean if the original `Elem` is destroyed by the time the wrapper is used. Yes, that's a danger. But that is the case for the original design, too: The pointer to the derived object may be dangling by the time it's used. That's a general danger with non-owning pointers (and references). But my answer addresses the main point of the question: Is it safe (or indeed avoidable) to downcast in an illegal way if the derived type has no new members? – jogojapan Dec 11 '13 at 12:28
  • 2
    I guess this answer is OK if you ignore the fact that the question says "Normally I write a polymorphic set of classes with accessor methods and a constructor which takes a reference to the message frame." - which is basically your suggested alternative "solution", when he wasn't asking for a solution, but for "any downside to casting from base to derived in this way, given that derived has no additional data members", which can easily be enforced by a code-generator (commonly used in these scenarios) or static analysis. – kfsone Jul 31 '16 at 16:55
2

No you can't !

It may work in your case but it's not advisable since (quick explanation) derived class may have more members or virtual functions, which would not be available from the base.

The easiest solution is to keep your inheritance scheme (which is good) but use a factory to instantiate the correct message type. Example :

struct GenericMessage* create_message(const char* body) {
   int msg_type = body[5]; // I don't know where type is coded, this is an example
   switch(msg_type) {
   case 1:
      return new MessageType1(body);
      break;
   // etc.

You can then safely dynamic_cast it later.

Note that you can put your factory anywhere, for example in the GenericMessage class itself, i.e.

GenericMessage myGenericMessage("1234");
MessageType1* myMgessageType1 = myGenericMessage.get_specialized_message();

Alternatively, you can also build a specialized message from a base one, but it's the same at the end :

GenericMessage myGenericMessage("1234");
MessageType1* myMgessageType1 = new MessageType1( myGenericMessage );
Offirmo
  • 18,962
  • 12
  • 76
  • 97
0

It's fine enough in many applications, if you add the following test:

static_assert(
    sizeof(MessageType1) == sizeof(GenericMessage),
        "Cannot overlay MessageType1 upon GenericMessage." );

No compiler optimization will change the layout of the base-type slice of the derived type, so this is usually safe enough.

Also, use static_cast. reinterpret_cast is for things more perverse than this.


...OK, yes, yes, this can fail when all of the following are true:

  • GenericMessage has padding at the end.
  • A member (later added) in MessageType1 happens to lay inside that padding.
  • You send the overlaid MessageType1 through code-paths that read from that padding-zone before writing to it.

So weigh expediency vs robustness, and then do what you think is best. You're not the first person to use this pattern, and it's not taboo, despite the hand-wringing in the other answers here -- although they are certainly correct that it has special hazards.

Keith Russell
  • 560
  • 4
  • 12