3

I am currently developing some software in C++ where I am sending and receiving custom data packets. I want to parse and manage these packets in a well structured manner. Obviously I am first receiving the header and after that the body of the data. The main problem is that I don't like creating a Packet-Object with only the header information and later on adding the body data. What is an elegant way of parsing and storing custom data packets?

Here is a rough sketch of what such a custom data packet could look like:

+-------+---------+---------+----------+------+
| Magic | Command | Options | Bodysize | Body |
+-------+---------+---------+----------+------+

(Lets assume Magic is 4 bytes, Command 1 byte, Options 2 bytes, Bodysize 4 bytes and the body itself is variable in length.) How would I parse this without using any third party libraries?

Normally I'd say something like this could be done to store packet data:

#include <array>

class Packet {
public:

    explicit Packet(std::array<char, 10> headerbytes);

    void set_body(std::vector<char> data);
    std::vector<char> get_body();

    int8_t get_command();

    int16_t get_options();

    bool is_valid();

private:

    bool valid;

    int8_t _command;

    int16_t _options;

    int32_t body_size;

    std::vector<char> _data;

};

The problem is that I provide the header-information first and than add the body data in a hacky way later on. The packet object has a point of time where it is accessible in an incomplete state.

I first receive the header and after the header was received another receive call is made to read the body. Would it make sense to have a parser instance that populates information into the packet object only make it accessible once it holds all needed information? Would it make sense to have a separate class for the header and the body? What would be the best design choice?

I am developing with C++ and for the sending and receiving of data over sockets the boost library is used.

Kyu96
  • 1,159
  • 2
  • 17
  • 35
  • 1
    "I don't like creating a Packet-Object with only the header information and later on adding the body data" - Why not? – Jesper Juhl Apr 21 '19 at 16:42
  • 1
    Have you considered using an existing widely used serialisation system such as Google protobuf, rather than reinventing the wheel? – rici Apr 21 '19 at 17:10
  • 1
    @Jesper I think this would be bad design because the object can be accessed in an incomplete state. I am not sure what is common practice neither do I know what would be the best design choice for this. I am thinking of some packet processor that performs parsing and a packet data container that is filled and only accessible once populated with all data – Kyu96 Apr 21 '19 at 23:19
  • @rici Yes I have, I prefer to do it myself. Not because this is necessarily better but because it helps me to deepen my understanding on how these things work and how they are designed. – Kyu96 Apr 22 '19 at 10:50
  • @JesperJuhl I updated my post and added some more information. – Kyu96 Apr 22 '19 at 10:51
  • Another way to do is a simple state machine. Many ways to implement one – kert Apr 25 '19 at 00:13
  • Whenever you have a "full stop" (e.g. return something or store it, basically interface type), what you produced should be a type on its own. If it's a header, then call it MessageHeader. In c++ move ctor is extremely inexpensive: you might simply create e.g. Message(MessageHeader&&, MessageBody&&) ctor. If you need help on polymorphic message body, lmk, and I give you details on cps-style visitors / factories which might be the best fit here (they're fast and use stack; yet provide explicit types). – lorro Apr 25 '19 at 22:14
  • Thanks. Yeah some examples and advice what to use in this scenario is very welcome! – Kyu96 Apr 25 '19 at 22:16
  • Just out of interest: How is it guaranteed that the first receive call will always receive the entire header and only the header, and the second receive call will always receive the entire body and only the body? In general, that's not how networking works, at least not any of the APIs that I'm familiar with…unless you're sending header and body as like separate UDP packages or something; but then how to you know which packets belong in which order? Who takes care of all that? – Michael Kenzel Apr 28 '19 at 21:58

4 Answers4

1

If you don’t want to tie the data reading into one complete constructor (for understandable reasons of separation of concerns), this is a good application for non-polymorphic inheritance:

struct Header {
  static constexpr SIZE=10;
  Header(std::array<char,SIZE>);

  std::int8_t get_command() const {return command;}
  std::int16_t get_options() const {return options;}
  std::int32_t body_size() const {return length;}

private:
  std::int8_t command;
  std::int16_t options;
  std::int32_t length;
};

struct Packet : private Header {
  using Body=std::vector<char>;
  Packet(const Header &h,Body b) : Header(h),body(std::move(b))
  {if(body.size()!=body_size()) throw …;}

  using Header::get_command;
  using Header::get_options;
  const Body& get_body() const {return body;}

private:
  Body body;
};

// For some suitable Stream class:
Header read1(Stream &s)
{return {s.read<Header::SIZE>()};}
Packet read2(const Header &h,Stream &s)
{return {h,s.read(h.body_size())};}
Packet read(Stream &s)
{return read2(read1(s),s);}

Note that the private inheritance prevents undefined behavior from deleting a Packet via a Header*, as well as the surely-unintended

const Packet p=read(s);
const Packet q=read2(p,s);   // same header?!

Composition would of course work as well, but might result in more adapter code in a full implementation.

If you were really optimizing, you could make a HeaderOnly without the body size and derive Header and Packet from that.

Davis Herring
  • 36,443
  • 4
  • 48
  • 76
  • Interesting approach. A few questions remain: Wouldn't it be better practice to make a separate struct for the Body as well? Also why did you choose structs over a class? Furthermore I was thinking if it might be conceivable to make the Packet class itself an interface? Maybe something like an IPacket which other classes can inherit? What do you think about this? – Kyu96 Apr 24 '19 at 21:29
  • @Kyu96: What purpose would a struct containing just the `Body` serve? It doesn’t abstract over anything. As for `struct` *vs.* `class`, they are exactly the same except for default access control. You need an interface iff you need dynamic polymorphism, which your problem statement doesn’t suggest. – Davis Herring Apr 24 '19 at 23:35
1

For this case I would use the pipeline design pattern creating 3 packet processor classes:

  • Command (handles magic bytes too)
  • Options
  • Body (handles body size too)

all derived from one base class.

typedef unsigned char byte;

namespace Packet
{
    namespace Processor
    {
        namespace Field
        {
            class Item
            {
            public:
                /// Returns true when the field was fully processed, false otherwise.
                virtual bool operator () (const byte*& begin, const byte* const end) = 0;
            };

            class Command: public Item
            {
            public:
                virtual bool operator () (const byte*& begin, const byte* const end);
            };

            class Options: public Item
            {
            public:
                virtual bool operator () (const byte*& begin, const byte* const end);
            };

            class Body: public Item
            {
            public:
                virtual bool operator () (const byte*& begin, const byte* const end);
            };
        }

        class Manager
        {
        public:
            /// Called every time new data is received
            void operator () (const byte* begin, const byte* const end)
            {
                while((*fields[index])(begin, end))
                {
                    incrementIndex();
                }
            }

        protected:
            void incrementIndex();

            Field::Command command;
            Field::Options options;
            Field::Body body;
            Field::Item* const fields[3] = { &command, &options, &body };
            byte index;
        };
    }
}
Flaviu
  • 931
  • 11
  • 16
  • That sounds like a very interesting approach, maybe you can provide a rough example for my scenario? Would you make header parsing, body parsing two pipeline steps? – Kyu96 Apr 24 '19 at 11:02
  • How does the client learn how much data to send to the `Manager`? – Davis Herring Apr 27 '19 at 06:52
  • @Davis Herring: The Manager is a state machine parsing the content of packets. It receives at a time an entire packet or a chunk of packet. It deals with fragmented packets. – Flaviu Apr 27 '19 at 13:58
  • @Kyu96: It's quite simple changing it to have only 2 steps, header and body. – Flaviu Apr 27 '19 at 13:59
  • @Flaviu: Then how does the client receive those packets if, say, one read is long enough to contain several? – Davis Herring Apr 27 '19 at 14:40
  • @Davis Herring: I see. "if" should be repleced with "while" ;) – Flaviu Apr 27 '19 at 16:11
  • @Flaviu: The `void Manager::operator()` can recognize as many packets as it wants, but how is that interleaved with *processing* them? It seems convoluted to, say, have to create the `Manager` with a callback and then pump data into it anyway rather than, say, having it read from a stream. – Davis Herring Apr 27 '19 at 16:25
  • @Davis Herring: Following Separation of Concerns design principle, the Manager only processes the received data. How the data is received - that is another concern. – Flaviu Apr 27 '19 at 16:43
1

You can use exceptions to prevent creation of incomplete packet objects.

I'd use char pointers instead of vectors for performance.

// not intended to be inherited
class Packet final {
public:
    Packet(const char* data, unsigned int data_len) {
        if(data_len < header_len) {
            throw std::invalid_argument("data too small");
        }

        const char* dataIter = data;

        if(!check_validity(dataIter)) {
            throw std::invalid_argument("invalid magic word");
        }
        dataIter += sizeof(magic);
        memcpy(&command, dataIter, sizeof(command)); // can use cast & assignment, too
        dataIter += sizeof(command);
        memcpy(&options, dataIter, sizeof(options)); // can use cast & assignment, too
        dataIter += sizeof(options);
        memcpy(&body_size, dataIter, sizeof(body_size)); // can use cast & assignment, too
        dataIter += sizeof(body_size);

        if( data_len < body_size+header_len) {
            throw std::invalid_argument("data body too small");
        }

        body = new char[body_size];
        memcpy(body, dataIter, body_size);
    }

    ~Packet() {
        delete[] body;
    }

    int8_t get_command() const {
        return command;
    }

    int16_t get_options() const {
        return options;
    }

    int32_t get_body_size() const {
        return body_size;
    }

    const char* get_body() const {
        return body;
    }

private:
    // assumes len enough, may add param in_len for robustness
    static bool check_validity(const char* in_magic) {
        return ( 0 == memcmp(magic, in_magic, sizeof(magic)) );
    }

    constexpr static char magic[] = {'a','b','c','d'};
    int8_t command;
    int16_t options;
    int32_t body_size;
    char* body;

    constexpr static unsigned int header_len = sizeof(magic) + sizeof(command)
            + sizeof(options) + sizeof(body_size);
};

Note: this is my first post in SO, so please let me know if something's wrong with the post, thanks.

U. Sivri
  • 28
  • 5
  • 1
    Thanks for your contribution. From what I understand this would only work if the packet length is already known beforehand? The header contains the body length but your constructor requires me to provide a given amount of data in advance that is then parsed? Also it's always helpful to include a usage example of the posted code ;-) – Kyu96 Apr 25 '19 at 10:49
  • 1
    While I don’t think `std::vector` is a significant performance problem here, if you don’t want to use it you should be using `std::unique_ptr`. Also, “cast & assignment” is undefined behavior (*cf.* [plans for implicit object creation](http://open-std.org/JTC1/SC22/WG21/docs/papers/2019/p0593r3.html)). – Davis Herring Apr 27 '19 at 07:08
  • @Kyu96 you are right about that, maybe you could use a packetbuilder with header and body setters, throwing exception when incomplete body/packet is requested? – U. Sivri Apr 29 '19 at 06:56
  • @DavisHerring as you suggested using unique (shared-depending on the context) pointer is the correct way IMO and thanks for the extra info. – U. Sivri Apr 29 '19 at 07:07
0

I'm guessing you are trying Object-oriented networking. If so, the best solution for such parsing would be Flatbuffers or Cap’n Proto C++ code generator. By defining a schema, you will get state machine code that will parse the packets in an efficient and safe way.

Flaviu
  • 931
  • 11
  • 16