-2

I have a a binary file format with a bunch of headers and floating point data. I am working on a code that parses the binary file. Reading the headers was not hard but when I tried to read the data I ran into some difficulties.

I opened the file and read the headers as the following:

ifs.open(fileName, std::ifstream::in | std::ifstream::binary);
char textHeader[3200];
BinaryHeader binaryHeader;
ifs.read(textHeader,sizeof(textHeader));
ifs.read(reinterpret_cast<char *>(&binaryHeader), sizeof(binaryHeader));

The documentation says the data is stored as: 4-byte IBM floating-point and I tried something similar:

vector<float> readData(int sampleSize){
    float tmp;
    std::vector<float> tmpVector;
    for (int i = 0; i<sampleSize; i++){
        ifs.read(reinterpret_cast<char *>(&tmp), sizeof(tmp));
        std::cout << tmp << std::endl;
        tmpVector.push_back(tmp);
    }
    return tmpVector;
}

Sadly the result does not seem correct. What do I do wrong?

EDIT: Forgot to mention, the binary data is in big-endian, but if I print the tmp values out the data does not seem correct either way.

Conclusion: The 4-byte IBM floating-point is not the same as the float.

timko.mate
  • 364
  • 1
  • 3
  • 13
  • It's hard to tell exactly, but at first glance there might be two issues that I can see. First you have an array of char for your header, this might be okay, but have you tried using an unsigned char instead? The second possible issue is that in your `readData()` function you create a temporary vector on that functions stack frame and you then return it. Maybe try changing the signature of this function to accept an `std::vector` by reference and pass it into the function instead of return a copy to a temporary. – Francis Cugler Feb 17 '19 at 21:32
  • 7
    Is the binary data big or little endian? – Retired Ninja Feb 17 '19 at 21:36
  • @RetiredNinja That's a good question to the OP, that would make a huge difference too! – Francis Cugler Feb 17 '19 at 21:37
  • Yeah, thanks. Forgot to mention all the values is the header is in big-endian. – timko.mate Feb 17 '19 at 21:40
  • 2
    https://stackoverflow.com/questions/2782725/converting-float-values-from-big-endian-to-little-endian – Retired Ninja Feb 17 '19 at 21:40
  • 3
    “4-byte IBM floating-point” is probably not the same as `float`. If that’s the case You’ll have to do some work to translate the input into something your hardware can work with. – Pete Becker Feb 17 '19 at 21:41
  • Yes, that is what I was afraid of. Thank you! – timko.mate Feb 17 '19 at 21:42
  • This may be the format: https://nssdc.gsfc.nasa.gov/nssdc/formats/IBM_32-Bit.html https://en.wikipedia.org/wiki/IBM_hexadecimal_floating_point – Retired Ninja Feb 17 '19 at 21:47
  • 3
    Google “4-byte IBM floating-point”. There’s lots of information out there. And, as I guessed earlier, it’s not the same layout as an IEEE float. – Pete Becker Feb 17 '19 at 21:51
  • Can you post one example output of `std::cout << std::hex << (int&)(tmp) << std::endl;` – rustyx Feb 17 '19 at 22:09
  • Yes, I got this: 33331341 3d331341 48331341 52331341 5d331341 67331341 72331341 7c331341 87331341 91331341 9c331341 a6331341 b1331341 bb331341 c6331341 d0331341 db331341 e5331341 f0331341 fa331341 4341341 f341341 19341341 24341341 2e341341 39341341 43341341 4e341341 58341341 63341341 6d341341 78341341 82341341 8d341341 97341341 a2341341 ac341341 b7341341 c1341341 cc341341 d6341341 e1341341 eb341341 f6341341 – timko.mate Feb 17 '19 at 22:15

1 Answers1

0

There are a few things to consider:

  • The first one, I'm not 100% sure if this would make a difference or not, but you are using an array of chars for your header char textHeader[3200];. Maybe you could try changing this to an array of unsigned char instead...

  • The second one in which I think may be a bigger issue which has to do more with performance is within your readData function itself. You are creating a local temporary std::vector of floats on that functions stack frame. Then you are returning it. The return isn't even by reference or pointer so this will also create unnecessary copies, however by the time the next piece of code tries to use this vector, the temporary has already been destroyed since the function has already gone out of scope. For this issue I would probably suggest changing the declaration and definition of this function.

    I would change it from what you currently have:

    vector<float> readData(int sampleSize)

    to this:

    void readData( int sampleSizes, std::vector<float>& data )

  • The third which is probably the most important of the three was mentioned in a form of a question in your comments by user RetiredNinja as I was originally writing this, had asked you a very good question about the endian of the data type being stored. This can also be a major factor. The actual data representation that is physically stored in memory I think is the biggest concern here.

According to the fact that your documentation has stated that it is stored as a 4-byte IBM floating-point type and that it is in big endian; I have found this specification by IBM that may be of help to you.

Francis Cugler
  • 7,788
  • 2
  • 28
  • 59
  • Thank you! I think the biggest problem as Pete Becker mentioned, the float and the 4-byte IBM floating-point is not the same. The data is in big-endian I have a function that converts it to little-endian but the result is not correct either way. – timko.mate Feb 17 '19 at 21:45
  • There’s nothing inherently wrong with returning a vector by value. And the problem doesn’t occur when reading the header, so making it unsigned won’t help. – Pete Becker Feb 17 '19 at 21:46
  • @PeteBecker Yeah I wasn't 100% sure, but mentioned it just incase. However, I still think the returning of vector by a local temp is just not favorable coding design, with large containers that are being filled with data or its contents being manipulated, I tend to prefer to pass them in to the functions by reference. However, the biggest issue I think is the actual representation of the data in memory itself. – Francis Cugler Feb 17 '19 at 21:50
  • Well, that may be reasonable style advice, but it won’t fix the problem. – Pete Becker Feb 17 '19 at 21:52
  • @PeteBecker I reworded my answer to reflect on that more. – Francis Cugler Feb 17 '19 at 21:55