0

In the past few weeks I downloaded pcap data from NYSE and learned how to parse the pcap files using C++.

I have been able to strip out all the network headers (UDP, IP headers) and get to the payload. I have started writing code to process the payload.

The method I have used so far is to define various structs for each message type. For e.g.

struct msg1 {
        uint16_t  msgSize; // 2 bytes
        uint16_t  msgType; // 2 bytes
        uint16_t  time; // 4 bytes
        uint16_t  timeNS; // 4 bytes
        uint8_t   productID; //1 byte
        uint8_t   channelID; // 1 byte 
};                      
        
struct msg34 {  
        uint16_t  msgSize;  // 2 byte   
        uint16_t  msgType;  // 2 byte 
        uint32_t  time;    // 4 byte
        uint32_t  timeNS;   // 4 byte
        uint32_t  symbolIndex; // 4 byte
        uint32_t  symbolSeqNum;  // 4 byte
        u_char    secStatus;    // 1 byte
        u_char    haltCondition;  // 1 byte
        uint32_t  reserved;     // 4 bytes
        uint32_t  price_1;  //  4 bytes
        uint32_t  price_2;  // 4 bytes 
        u_char    ssr;          // 1 byte
        uint32_t  ssrVol;       // 4 bytes
        uint32_t  time;         // 4 bytes
        u_char    ssrState;     // 1 byte
        u_char    mktState;     //1 byte
        u_char    sessionState;   // 1 byte          
};  

Next, once I determine the appropriate message type from the first 2 bytes of the payload, I use reinterpret_cast to initialize the struct and store the payload data as follows:

const uint16_t* msgType = reinterpret_cast<const uint16_t*>(packet+2);
switch(*msgType) {  
    case 34: {
         const msg34* msg34_ = reinterpret_cast<const msg34*>(packet);
         std::cout << *msg34_;
         break; 
    }
}

packet here is of type const u_char* that points to the payload which is in binary format.

Some people did not approve the use of reinterpret_cast to initialize the struct citing that it unsafe. I would like to get some feedback on an alternative method to directly initialize a struct with binary data accessed by the const u_char* pointer named packet.

Besides, I would also like to get some advice on how I could better structure the different message types for better code readability and runtime performance. Right now I have to define a struct for each message type and as you can imagine, my struct definition file is pretty long. Would it be better to define the messages using classes (abstract and real)?

Thank you in advance for any and all help.

P.S.: If anyone needs any help getting started on a similar project, feel free to reach out and I will share whatever code I have.

  • 1
    I guess unions as used to do that. At least in C, but not in *standard* C++! Indeed, AFAIK using unions in C++ for this produces an *ill-formed* program (as for reinterpret casts here). The root of the problem lies in the [strict aliasing rule](https://stackoverflow.com/questions/98650/what-is-the-strict-aliasing-rule) (SAR). If the structure is *packed*, the standard way is to use a *memcpy* call (to escape the SAR). – Jérôme Richard Jul 05 '20 at 00:27
  • Did you read a good [C++ programming book](https://ptgmedia.pearsoncmg.com/images/9780321992789/samplepages/9780321992789.pdf) ? And the documentation of your C++ compiler (perhaps [GCC](http://gcc.gnu.org) ....) ? – Basile Starynkevitch Jul 08 '20 at 13:01

0 Answers0