Parsing a binary message in C++. Any lib with examples?

Question

I am looking for any library of example parsing a binary msg in C++. Most people asks for reading a binary file, or data received in a socket, but I just have a set of binary messages I need to decode. Somebody mentioned boost::spirit, but I haven't been able to find a suitable example for my needs.

As an example: 9A690C12E077033811FFDFFEF07F042C1CE0B704381E00B1FEFFF78004A92440

where first 8 bits are a preamble, next 6 bits the msg ID (an integer from 0 to 63), next 212 bits are data, and final 24 bits are a CRC24.

So in this case, msg 26, I have to get this data from the 212 data bits:

4 bits integer value
4 bits integer value
A 9 bit float value from 0 to 63.875, where LSB is 0.125
4 bits integer value

EDIT: I need to operate at bit level, so a memcpy is not a good solution, since it copies a number of bytes. To get first 4-bit integer value I should get 2 bits from a byte, and another 2 bits from the next byte, shift each pair and compose. What I am asking for is a more elegant way of extracting the values, because I have about 20 different messages and wanted to reach a common solution to parse them at bit level.

And so on.

Do you know os any library which can easily achieve this?

I also found other Q/A where static_cast is being used. I googled about it, and for each person recommending this approach, there is another one warning about endians. Since I already have my message, I don't know if such a warning applies to me, or is just for socket communications.

EDIT: boost:dynamic_bitset looks promising. Any help using it?

You have 212 bits but you said that you need only 21 (4+4+9+4) bits. What meaning of other 191 bits? — Denis Ermolin, Oct 22 '12 at 07:57
you can have a look at: https://github.com/iso8859-1/BufferHandler. It's not complete yet but it should do what you want. — Tobias Langner, Oct 22 '12 at 07:57
@DenisErmolin I said "And so on". If somebody helps me parsing those first values, I can parse the other 191 on my own. Anyway, if you want to know, basically 9 bit float and last 4 bit integer are repeated 14 more times — Roman Rdgz, Oct 22 '12 at 07:59

score 6 · Accepted Answer · edited May 23 '17 at 12:18

6

If you can't find a generic library to parse your data, use bitfields to get the data and memcpy() it into an variable of the struct. See the link Bitfields. This will be more streamlined towards your application.

Don't forget to pack the structure.

Example:

#pragma pack

include "order32.h"
struct yourfields{
#if O32_HOST_ORDER == O32_BIG_ENDIAN
   unsigned int preamble:8;
   unsigned int msgid:6;
   unsigned data:212;
   unsigned crc:24;
#else
   unsigned crc:24;
   unsigned data:212;
   unsigned int msgid:6;
   unsigned int preamble:8;
#endif
}/*__attribute__((packed)) for gcc*/;

You can do a little compile time check to assert if your machine uses LITTLE ENDIAN or BIG ENDIAN format. After that define it into a PREPROCESSOR SYMBOL::

//order32.h

#ifndef ORDER32_H
#define ORDER32_H

#include <limits.h>
#include <stdint.h>

#if CHAR_BIT != 8
#error "unsupported char size"
#endif

enum
{
    O32_LITTLE_ENDIAN = 0x03020100ul,
    O32_BIG_ENDIAN = 0x00010203ul,
    O32_PDP_ENDIAN = 0x01000302ul
};

static const union { unsigned char bytes[4]; uint32_t value; } o32_host_order =
    { { 0, 1, 2, 3 } };

#define O32_HOST_ORDER (o32_host_order.value)

#endif

Thanks to code by Christoph @ here

Example program for using bitfields and their outputs:

#include <iostream>
#include <cstdio>
#include <cstdlib>
#include <memory.h>
using namespace std;

struct bitfields{
  unsigned opcode:5;
  unsigned info:3;
}__attribute__((packed));

struct bitfields opcodes;

/* info: 3bits; opcode: 5bits;*/
/* 001 10001  => 0x31*/
/* 010 10010  => 0x52*/

void set_data(unsigned char data)
{
  memcpy(&opcodes,&data,sizeof(data));
}

void print_data()
{
  cout << opcodes.opcode << ' ' << opcodes.info << endl;
}

int main(int argc, char *argv[])
{
  set_data(0x31);
  print_data(); //must print 17 1 on my little-endian machine
  set_data(0x52); 
  print_data(); //must print 18 2
  cout << sizeof(opcodes); //must print 1
  return 0;
}

edited May 23 '17 at 12:18

Community

1
1

answered Oct 22 '12 at 08:05

Aniket Inge

25,375
5
50
78

1

@RomanRdgz structures are aligned at word boundaries by default(which is 4 or 8 bytes). To avoid compilers to pad values(and therefore destroy the structure) we use #pragma pack(compiler dependent) (for it to work on gcc, you need to remove pragma pack and use __attribute__((packed)) – Aniket Inge Oct 22 '12 at 08:34
I believe this approach breaks as soon as you try to compile your code on a platform with a different byte-order. – Frerich Raabe Oct 22 '12 at 08:40
@PrototypeStark sorry but I don't understand the endian issue. When I was at college, I thought all the big/little endian stuff were about socket transmissions, so, If I already have a binary msg stored into a char buffer, I would think data is correctly ordered without having to check if system is big or little endian. Have I been wrong about that all this time, or just making this assumption because most architectures use little endian? – Roman Rdgz Oct 22 '12 at 09:38
@RomanRdgz pretty much all archs these days are little endian. But never assume. simple test to determine endianness in C/C++: int isLittleEndian(){ char ch = (char)(0xFFEE); if(ch == 0xEE)return 1; else return 0; } – Aniket Inge Oct 22 '12 at 09:41
@PrototypeStark so Endian is about how data is stored at memory, and not how it is transmited from a system to another one? Anyway, how can I parse my example message to fill this struct? – Roman Rdgz Oct 22 '12 at 10:17
@RomanRdgz if I am right, your data is in an `unsigned char`(or `byte`) buffer. simply use `memcpy()` function to copy data from `byteBuffer` to an object of type `yourfields`. – Aniket Inge Oct 22 '12 at 10:23
@PrototypeStark but if I use memcpy, I have to tell how many bytes to copy. And I need to work al bit level. Look at the question: 8 bits preamble, then 6 for msg type, and then 2 integer values of 4 bits each. So, for msg I need to get one byte and shift 2 places right. But for the first 4bit value I need to get 2 bits from second byte and 2 from the third. Shift each pair and then compose both together. When I published this question I was trying to avoid this, to reach an elegant solution, and now I am doing the same I would have tried from the beginning. Any other way of doing it? – Roman Rdgz Oct 22 '12 at 11:06
1

You're already avoiding the shifting and composing because you're setting the data straight into a buffer of length = `sizeof` of the structure. see the explanation and images explaining bitfields: href="http://msdn.microsoft.com/en-us/library/ewwyfdbe(v=vs.71).aspx" – Aniket Inge Oct 22 '12 at 11:18
@PrototypeStark Of, so I just have to memcpy from a buffer to a struct, and that's it? (BTW, broken link) – Roman Rdgz Oct 22 '12 at 11:23
@RomanRdgz see the link I posted again and an example I added to my post to clear some air – Aniket Inge Oct 22 '12 at 12:01
@PrototypeStark one more question: is it possible to use std::copy instead of memcpy when coping onto a struct? I know it can work with memcpy, but lately I'm trying to rely on std as much as possible – Roman Rdgz Oct 22 '12 at 13:02
@RomanRdgz I would stick with memcpy() its more direct and faster. – Aniket Inge Oct 22 '12 at 13:04
@RomanRdgz see this discussion http://stackoverflow.com/questions/4707012/c-memcpy-vs-stdcopy – Aniket Inge Oct 22 '12 at 13:07
@PrototypeStark I don't know what's wrong, but doesn't work. Please take a look: http://ideone.com/t3gEts – Roman Rdgz Oct 22 '12 at 13:23
@RomanRdgz you mention that its only 250 bits of data in your question, right? – Aniket Inge Oct 22 '12 at 14:03
@PrototypeStark That's it, 31.25 bytes. In that example I am entering 32 bytes, because the extra load are zeros. Where is the problem? – Roman Rdgz Oct 22 '12 at 14:12
@RomanRdgz the problem firstly is you haven't packed the structure like I asked you to :). Without packing it will NOT work – Aniket Inge Oct 22 '12 at 14:33
@PrototypeStark I currently have Visual C++ compiler, and I tried with #pragma pack() without success. That's why I didn't include it – Roman Rdgz Oct 22 '12 at 15:33
@RomanRdgz I am having VS2010 too. Let me post you the solution then quickly. – Aniket Inge Oct 22 '12 at 15:34

Denis Ermolin · Answer 2 · 2012-10-22T10:14:12.570

1

You can manipulate bits for your own, for example to parse 4 bit integer value do:

char[64] byte_data;
size_t readPos = 3; //any byte
int value = 0; 
int bits_to_read = 4;
for (size_t i = 0; i < bits_to_read; ++i) {
    value |= static_cast<unsigned char>(_data[readPos]) & ( 255 >> (7-i) );
}

Floats usually sent as string data:

std::string temp;
temp.assign(_data+readPos, 9);
flaot value = std::stof(temp);

If your data contains custom float format then just extract bits and do your math:

char[64] byte_data;
size_t readPos = 3; //any byte
float value = 0; 
int i = 0;
int bits_to_read = 9;
while (bits_to_read) {
    if (i > 8) {
      ++readPos;
      i = 0;
    }
    const int bit = static_cast<unsigned char>(_data[readPos]) & ( 255 >> (7-i) );
    //here your code
    ++i;
    --bits_to_read;
}

edited Oct 22 '12 at 10:14

answered Oct 22 '12 at 08:04

Denis Ermolin

5,530
6
27
44

4bit_int << wrong variable name. Variable names cannot start with a number. – Aniket Inge Oct 22 '12 at 08:11
you cannot create integers of < 8 bits(=1byte). Hence bitfields. – Aniket Inge Oct 22 '12 at 08:12
@DenisErmolin why using '_data' instead of just 'data'? Anyway, you are recommending static cast, but how do I get the 9 bit float? Because it is not coming as a string – Roman Rdgz Oct 22 '12 at 08:24
Do you how float value was packed into byte? – Denis Ermolin Oct 22 '12 at 08:47
@DenisErmolin LSB is 0.125, so if I have 0 0000 0001 it is 0.125. If 0 0000 0010 it is 0.250... if 1 000 0001 it would be (0.125)*2^8 + 0.125 = 32.125. At least that is what I understand when I read that LSB has that value – Roman Rdgz Oct 22 '12 at 09:41

score 0 · Answer 3 · answered Oct 22 '12 at 08:29

Here is a good article that describes several solutions to the problem.

It even contains the reference to the ibstream class that the author created specifically for this purpose (the link seems dead, though). The only other mention of this class I could find is in the bit C++ library here - it might be what you need, though it's not popular and it's under GPL.

Anyway, the boost::dynamic_bitset might be the best choice as it's time-tested and community-proven. But I have no personal experience with it.

Parsing a binary message in C++. Any lib with examples?

3 Answers3