2

I am trying to read chunks of data from a file directly into a struct but the padding is causing too much data to be read and the data to be misaligned.

Do I have to manually read each part into the struct or is there an easier way to do this?

My code:

The structs

typedef unsigned char byte;

struct Header
{
    char ID[10];
    int  version;
};

struct Vertex //cannot rearrange the order of the members
{
    byte    flags;
    float   vertex[3];
    char    bone;
    byte    referenceCount;
};

How I am reading in the data:

std::ifstream in(path.c_str(), std::ifstream::in | std::ifstream::binary);

Header header;
in.read((char*)&header.ID, sizeof(header.ID));
header.ID[9] = '\0';
in.read((char*)&header.version, sizeof(header.version));
std::cout << header.ID << " " << header.version << "\n";
in.read((char*)&NumVertices, sizeof(NumVertices));
std::cout << NumVertices << "\n";

std::vector<Vertex> Vertices(NumVertices);

for(std::vector<Vertex>::iterator it = Vertices.begin(); it != Vertices.end(); ++it)
{
    Vertex& v = (*it);
    in.read((char*)&v.flags, sizeof(v.flags));
    in.read((char*)&v.vertex, sizeof(v.vertex));
    in.read((char*)&v.bone, sizeof(v.bone));
    in.read((char*)&v.referenceCount, sizeof(v.referenceCount));
}

I tried doing: in.read((char*)&Vertices[0], sizeof(Vertices[0]) * NumVertices); but this produces incorrect results because of what I believe to be the padding.

Also: at the moment I am using C-style casts, what would be the correct C++ cast to use in this scenario or is a C-style cast okay?

Rarge
  • 221
  • 1
  • 4
  • 13

6 Answers6

3

If you're writing the entire structure out in binary, you don't need to read it as if you had stored each variable separately. You would just read in the size of the structure from file into the struct you have defined.

Header header;
in.read((char*)&header, sizeof(Header));

If you're always running on the same architecture or the same machine, you won't need to worry about endian issues as you'll be writing them out the same way your application needs to read them in. If you are creating the file on one architecture and expect it to be portable/usable on another, then you will need to swap bytes accordingly. The way I have done this in the past is to create a swap method of my own. (for example Swap.h)

Swap.h - This is the header you use within you're code

void swap(unsigned char *x, int size);

------------------

SwapIntel.cpp - This is what you would compile and link against when building for Intel

void swap(unsigned char *x, int size)
{
    return;   // Do nothing assuming this is the format the file was written for Intel (little-endian)
}

------------------

SwapSolaris.cpp -  This is what you would compile and link against when building for Solaris

void swap(unsigned char *x, int size)
{
    // Byte swapping code here to switch from little-endian to big-endian as the file was written on Intel
    // and this file will be the implementation used within the Solaris build of your product
    return;   
}
RC.
  • 27,409
  • 9
  • 73
  • 93
  • When you write the structure to file, it will be written with all the data, including your null line endings and padding. Therefore, when you read it back in, everything is put back into place the way it is expected. – RC. Apr 27 '11 at 13:18
  • Oh I understand (I think). The files weren't necessarily written on the same machine that's reading them so the files could have a different endianness too and this is why there's alignment issues too? – Rarge Apr 27 '11 at 13:24
  • Yes, you'll want to make sure you're not writing out the data the same way you're reading it in currently. (i.e. you shouldn't have code like write((char*)&header.ID, 10) ) You should be writing the structure as a whole as in write((char*)&header, sizeof(header)); then read it in as stated above. – RC. Apr 27 '11 at 13:32
  • Brilliant, I wasn't aware that these stem to the way the file is structured too. Thank you. I have a question about the endianness; Will the whole structure be in the file "backwards" or will the members still be in the same order but their data will be backwards? – Rarge Apr 27 '11 at 13:37
2

No, you don't have to read each field separately. This is called alignment/packing. See http://en.wikipedia.org/wiki/Data_structure_alignment

C-style cast is equivalent to reinterpret_cast. In this case you use it correctly. You may use a C++-specific syntax, but it is a lot more typing.

  • "but it is a lot more typing." that's a very sad point, with today's tooling. – jv42 Apr 27 '11 at 14:15
  • @jv42: Having two syntactically different options to describe the same expression I prefer the shortest one even if the editor auto-completes. –  Apr 27 '11 at 14:29
  • That's your choice and it's fine, I do use C-style casts a lot when coding in C++ myself. But the expressiveness of C++ style casts is much better, and conveys the intents without adding comments (ie am I breaking a const specifier, am I doing a type conversion or am I messing with pointers). – jv42 Apr 27 '11 at 14:51
2

You can change padding by explicitly asking your compiler to align structs on 1 byte instead of 4 or whatever its default is. Depending on environment, this can be done in many different ways, sometimes file by file ('compilation unit') or even struct by struct (with pragmas and such) or only on the whole project.

jv42
  • 8,521
  • 5
  • 40
  • 64
2

header.ID[10] = '\0';

header.ID[9] is the last element of the array.

Oswald
  • 31,254
  • 3
  • 43
  • 68
1

If you are using a Microsoft compiler then explore the align pragma. There are also the alignment include files:

#include <pshpack1.h>
// your code here
#include <poppack.h>

GNU gcc has a different system that allows you to add alignment/padding to the structure definition.

J Evans
  • 1,090
  • 2
  • 16
  • 36
  • Microsoft compiler also supports pack pragmas. I prefer using them as code gets more portable / compiler independent. See http://msdn.microsoft.com/en-us/library/ms253935.aspx –  Apr 27 '11 at 13:14
  • When compiled if the program is ran on a different machine will the structs still have the padding given from either method here? (I feel this might be a dumb question) – Rarge Apr 27 '11 at 13:44
  • @Rarge - yes, padding affects the binary image, so is fixed at compile/link time. – J Evans Apr 28 '11 at 15:10
0

If you are reading and writing this file yourself, try Google Protobuf library. It will handle all byteorder, alignment, padding and language interop issues.

http://code.google.com/p/protobuf/

blaze
  • 4,326
  • 18
  • 23