1

I have a set of .bin files containing data in a formally specified format. I know exactly how many bytes there are for each field e.g. name = 40 bytes, version number = 2 bytes etc. I also know the exact order they are stored in the file (e.g. name, then version number....).

So far I can load the data from a file into an std::vector<unsigned char> list, then step through that data and read the fields in as per the number of expected bytes.

The issue is that this method is very long and error prone should I get any of the fields wrong (there's alot of different fields).

I've looked at and talked to people about struct packing, pointer casting and bit fields. I just can't seem to get them all to work together.

How can I read the data into my buffer, then 'overlay' my struct on the buffer? Then all the fields would populate according to the allocated bit fields I've given each value in the struct.

The issue with bit fields is that I can't take in strings.

Advice or example code would be highly appreciated. If you'd like just comment and I can give you code to show what I have so far and what I'm trying to achieve.

#include <vector>

int main()
{
    //File data loaded by function call
    std::vector<unsigned char> fileData;

    //How do I cast fileData to be a dataFields type? 
}

struct dataFields 
{
    int ID : 8;
    // Cannot use bit field for string type? 
    std::string name;
    int versionNumber : 16;
    int someOtherValue : 8;
}

I cannot give the exact code I'm working on for work reasons but I feel this sumarises what I'm trying to do fairly well in a simple manor.

MaxA
  • 15
  • 4
  • 1
    Please provide some example. Do read https://stackoverflow.com/help/mcve Also, you have tagged both C and C++, please tag the exact language you are using - either C or C++. – kiner_shah Nov 03 '21 at 11:31
  • Make a packed struct that matches what you know about the data (non-packed structs have no specific memory layout you could rely on). Copy the data over it with memcpy (this is the one aliasing method that people can agree on as legal). Test extensively. That's the most advice you're going to get at this level of detail. –  Nov 03 '21 at 11:32
  • The simplest, non-portable solution is to read a buffer the size of a record and cast it to a struct pointer. – stark Nov 03 '21 at 11:34
  • Why would you want to use bit field with a string type? In this case you have to use a raw C array with explicitly stated size: `char name[40]`. – user3366592 Nov 03 '21 at 11:42
  • Then assuming you have an instance of `dataFields` called `d` and the `fileData` vector contains the binary data loaded from your file, you have to use `memcpy`: `memcpy(&d, &fileData[0], sizeof(dataFields))` – user3366592 Nov 03 '21 at 11:45
  • You can't populate an std::string by casting or memcpy, stick to [Plain Old Data](https://stackoverflow.com/questions/146452/what-are-pod-types-in-c) types. –  Nov 03 '21 at 11:50
  • I have done something similar in the past with C, not ++ but I think it should work just the same. I simply create the struct and allocate memory for it and then pass its pointer to the fread or whichever function is used to read the data. That is the fread puts the data directly into the struct with no intermediate buffer. One reason I did it this way is because it was a very small embedded system and couldn't afford to have a separate large buffer and this was the simplest way around that limitation I could come up with.. – Christian Nov 03 '21 at 12:06

2 Answers2

0

No, you indeed cannot use bit pattern for std::string, you wouldn't want to anyway since it contains just a few pointers.

The usual approach I use in my projects is having POD structs for each record type. Then the lowest layer responsible for {de}serialization converts only between PODs and bytes. Any C++ logic, like std::string or variable-length std::vector are dealt with at higher levels.

#include <array>
#include <type_traits>
#include <cstdint>
#include <cstring>

struct Record{
    std::uint8_t ID;
    std::array<char,40> name;
    std::uint16_t versionNumber;
    std::uint8_t someOtherValue;
};

static_assert(sizeof(Record)==46);
static_assert(offsetof(Record,name)==1);

In my world, I try to have the Record respect the standard alignement to sizeof(E) for each element. You can add packed modifiers if needed. Prefer types from <cstdint> before bitfields.

I recommend putting a bunch of static_asserts after each Record, verifying its layout. Otherwise someone will one day come along and try to "clean up" the code, breaking everything. It also nicely documents the protocol for the reader.

One downside is that this does not easily support putting variable-length members in the middle or having multiple of them, but I never had the need to do so, keep packets simple.

Also I just decide on fixed endianess for the protocol. If someone needs something else, it's their responsibility to pass correctly encoded Records for serialization.

Serialization helpers:

template<typename T>
T read_value(const unsigned char*& ptr){
    static_assert(std::is_standard_layout_v<T>);

    T value;
    std::memcpy(&value,ptr,sizeof(T));
    ptr+=sizeof(T);
    return value;
}

template<typename T>
void write_value(unsigned char*& ptr, const T& value){
    static_assert(std::is_standard_layout_v<T>);

    std::memcpy(ptr,&value,sizeof(T));
    ptr+=sizeof(T);
}

The lowest layer responsible for {de}serialization can look something like this:

void deserialize_stream(const unsigned char* bytes){\
    // Output is bunch of POD types.
    auto record1 = read_value<Record>(bytes);
    auto record2 = read_value<Record>(bytes);
}

void serialize_stream(unsigned char* bytes){
    // Input is a list of POD types to serialize.
    Record record1{1,"Foo",12,42};
    Record record2{2,"Bar",14,28};

    write_value(bytes,record1);
    write_value(bytes,record2);
}

Example

int main() { 
    // Just a example, CHECK SIZE in real world.
    std::array<unsigned char,1024> buffer;

    serialize_stream(buffer.data());
    deserialize_stream(buffer.data());

}
Quimby
  • 17,735
  • 4
  • 35
  • 55
0

Consider using a serialization library to do this if this part is not time/storage efficiency bounded. Those libraries can serialize your objects into XML or JSON and deserialize it easily. You do not need to concern about endianness or POD problems.

ABacker
  • 190
  • 4