Parsing binary data from file

Question

and thank you in advance for your help!

I am in the process of learning C++. My first project is to write a parser for a binary-file format we use at my lab. I was able to get a parser working fairly easily in Matlab using "fread", and it looks like that may work for what I am trying to do in C++. But from what I've read, it seems that using an ifstream is the recommended way.

My question is two-fold. First, what, exactly, are the advantages of using ifstream over fread?

Second, how can I use ifstream to solve my problem? Here's what I'm trying to do. I have a binary file containing a structured set of ints, floats, and 64-bit ints. There are 8 data fields all told, and I'd like to read each into its own array.

The structure of the data is as follows, in repeated 288-byte blocks:

Bytes 0-3: int
Bytes 4-7: int
Bytes 8-11: float
Bytes 12-15: float
Bytes 16-19: float
Bytes 20-23: float
Bytes 24-31: int64
Bytes 32-287: 64x float

I am able to read the file into memory as a char * array, with the fstream read command:

char * buffer;
ifstream datafile (filename,ios::in|ios::binary|ios::ate);
datafile.read (buffer, filesize); // Filesize in bytes

So, from what I understand, I now have a pointer to an array called "buffer". If I were to call buffer[0], I should get a 1-byte memory address, right? (Instead, I'm getting a seg fault.)

What I now need to do really ought to be very simple. After executing the above ifstream code, I should have a fairly long buffer populated with a number of 1's and 0's. I just want to be able to read this stuff from memory, 32-bits at a time, casting as integers or floats depending on which 4-byte block I'm currently working on.

For example, if the binary file contained N 288-byte blocks of data, each array I extract should have N members each. (With the exception of the last array, which will have 64N members.)

Since I have the binary data in memory, I basically just want to read from buffer, one 32-bit number at a time, and place the resulting value in the appropriate array.

Lastly - can I access multiple array positions at a time, a la Matlab? (e.g. array(3:5) -> [1,2,1] for array = [3,4,1,2,1])

Is there any reason to read the whole file at once as oppose to just the blocks you were expecting? — Lionel, Oct 24 '11 at 05:12

Andrew Walker · Answer 1 · 2011-10-24T21:43:48.020

Firstly, the advantage of using iostreams, and in particular file streams, relates to resource management. Automatic file stream variables will be closed and cleaned up when they go out of scope, rather than having to manually clean them up with fclose. This is important if other code in the same scope can throw exceptions.

Secondly, one possible way to address this type of problem is to simply define the stream insertion and extraction operators in an appropriate manner. In this case, because you have a composite type, you need to help the compiler by telling it not to add padding bytes inside the type. The following code should work on gcc and microsoft compilers.

#pragma pack(1)
struct MyData
{
    int i0;
    int i1;
    float f0;
    float f1;
    float f2;
    float f3;
    uint64_t ui0;
    float f4[64];
};
#pragma pop(1)

std::istream& operator>>( std::istream& is, MyData& data ) {
    is.read( reinterpret_cast<char*>(&data), sizeof(data) );
    return is;
}

std::ostream& operator<<( std::ostream& os, const MyData& data ) {
    os.write( reinterpret_cast<const char*>(&data), sizeof(data) );
    return os;
}

What about endianness? This code assumes that the endianness of the ints in the file is the same as the native endianness of the machine. — BarbaraKwarc, Jan 13 '17 at 14:03

score 0 · Answer 2 · answered Oct 24 '11 at 05:27

char * buffer;
ifstream datafile (filename,ios::in|ios::binary|ios::ate);
datafile.read (buffer, filesize); // Filesize in bytes

you need to allocate a buffer first before you read into it:

buffer = new filesize[filesize];
datafile.read (buffer, filesize);

as to the advantages of ifstream, well it is a matter of abstraction. You can abstract the contents of your file in a more convenient way. You then do not have to work with buffers but instead can create the structure using classes and then hide the details about how it is stored in the file by overloading the << operator for instance.

score 0 · Answer 3 · answered Oct 24 '11 at 05:48

0

You might perhaps look for serialization libraries for C++. Perhaps s11n might be useful.

answered Oct 24 '11 at 05:48

Basile Starynkevitch

223,805
18
296
547

score 0 · Answer 4 · edited May 23 '17 at 12:16

This question shows how you can convert data from a buffer to a certain type. In general, you should prefer using a std::vector<char> as your buffer. This would then look like this:

#include <iostream>
#include <vector>
#include <algorithm>
#include <iterator>

int main() {
    std::ifstream input("your_file.dat");
    std::vector<char> buffer;
    std::copy(std::istreambuf_iterator<char>(input),
              std::istreambuf_iterator<char>(),
              std::back_inserter(buffer));
}

This code will read the entire file into your buffer. The next thing you'd want to do is to write your data into valarrays (for the selection you want). valarray is constant in size, so you have to be able to calculate the required size of your array up-front. This should do it for your format:

std::valarray array1(buffer.size()/288); // each entry takes up 288 bytes

Then you'd use a normal for-loop to insert the elements into your arrays:

for(int i = 0; i < buffer.size()/288; i++) {
    array1[i] = *(reinterpret_cast<int *>(buffer[i*288]));   // first position
    array2[i] = *(reinterpret_cast<int *>(buffer[i*288]+4)); // second position
}

Note that on a 64-bit system this is unlikely to work as you expect, because an integer would take up 8 bytes there. This question explains a bit about C++ and sizes of types.

The selection you describe there can be achieved using valarray.

Parsing binary data from file

4 Answers4