0

I have a binary file which has the following format :

# vtk DataFile Version 4.0
vtk output
BINARY
DATASET POLYDATA
POINTS 10000 double
?�T�����?����h�?�T�����?���� <-- 10000 double values (in binary format) follow separated by space and new line after every 9 values.

I want to read this file byte by byte so that I can store these double values in my array. I have the following code which loads this file into a char *buffer array. Now I want to know how to proceed further?

#include<iostream>     
#include<fstream> 
#include<sstream>
#include<stdlib.h>     
#include<string>
using namespace std;

int main () {

  ifstream is ("Data_binary.vtk", ifstream::binary);
  if (is) {
    // get length of file:
    is.seekg (0, is.end);
    unsigned long length = is.tellg();
    is.seekg (0, is.beg);

    char * buffer = new char [length+1];
    buffer[length] = '\0';

    cout << "Reading " << length << " characters... ";
    // read data as a block:
    is.seekg(0, is.beg);
    is.read (buffer,length);

    if (is)
      cout << "all characters read successfully." << endl;
    else
      cout << "error: only " << is.gcount() << " could be read";
    is.close();
   }

  return 0;
}

In ASCII format, an example file would look like the following :

# vtk DataFile Version 4.0
vtk output
ASCII
DATASET POLYDATA
POINTS 18 double
.1 .2 .3 1.4 11.55 1 0 8e-03 5.6
1.02 2.2 3.3 .1 .5 0.001 4e-07 4.2 1.55

For binary file, the double values are present in binary. I want to get double values from binary format.

Jaipreet
  • 45
  • 7
  • In particular [@sehe's answer](http://stackoverflow.com/a/7828841/1413395) looks like the way to go. It needs to use `boost::spirit` though. – πάντα ῥεῖ Aug 30 '16 at 00:12
  • @πάντα ῥεῖ I have data in binary file. I want to learn how to read binary data byte by byte to get characters or to read sequence of bytes to get double/integer values. – Jaipreet Aug 30 '16 at 00:15
  • That's not a binary file. What you describe is exactly the `.vdk` file format (see [here](http://dunne.uni-hd.de/VisuSimple/documents/vtkfileformat.html) please) – πάντα ῥεῖ Aug 30 '16 at 00:18
  • In vtk file format, initial 5 lines contains characters and remaining file is either ASCII or BINARY depending on the format specified in the 3rd line. I have remaining data as binary data and I want to read that. – Jaipreet Aug 30 '16 at 00:20
  • 3
    I reopened your question, though I think there's some valuable information given in the linked answer (question). Try to improve your question, looks too broad for me now. – πάντα ῥεῖ Aug 30 '16 at 00:22
  • If you can provide a sample file and the output that you expect to get, that will help make your question a lot clearer. – bot1131357 Aug 30 '16 at 00:28
  • @bot1131357 I have edited the question. Hope it's more clear now. – Jaipreet Aug 30 '16 at 00:40
  • 2
    @Jaipreet "How do I proceed further?" unfortunately really is too broad, mostly because you haven't really *done* anything yet. You've successfully moved the file data from disk to a buffer in memory, but now there's a thousand options of how to proceed. You're going to have to get a bit farther on your own first. Try something, then if you're stumped at a specific point, ask about that in its own question here. Your question is clear, it's about the concept of reading from a file with mixed text / non-text data, but you're still essentially asking somebody to write the whole program for you. – Jason C Aug 30 '16 at 00:50
  • 1
    Btw check out e.g. http://stackoverflow.com/questions/2063606/c-c-library-for-vtk-io or https://www.google.com/search?q=vtk+file+c%2B%2B, there appear to be some pre-existing libraries for this that you can use or at least look at the source of. I am not familiar with VTK but even the description of the [tag:vtk] tag here mentions that it's got a library specifically designed to handle these files, and points to their web site which has all of the source: http://www.vtk.org/download/ – Jason C Aug 30 '16 at 01:01
  • If I understand you correctly, you can easily read the number of bytes required and then cast them into the expected types, since you know the data structure of the file. Can you post a link to a sample binary file, and the expected output (what you expect to see on the console/file when you read the file.) – bot1131357 Aug 30 '16 at 01:46

2 Answers2

1
Use this function.


/*
* read a double from a stream in ieee754 format regardless of host
*  encoding.
*  fp - the stream
*  bigendian - set to if big bytes first, clear for little bytes
*              first
*
*/
double freadieee754(FILE *fp, int bigendian)
{
    unsigned char buff[8];
    int i;
    double fnorm = 0.0;
    unsigned char temp;
    int sign;
    int exponent;
    double bitval;
    int maski, mask;
    int expbits = 11;
    int significandbits = 52;
    int shift;
    double answer;

    /* read the data */
    for (i = 0; i < 8; i++)
        buff[i] = fgetc(fp);
    /* just reverse if not big-endian*/
    if (!bigendian)
    {
        for (i = 0; i < 4; i++)
        {
            temp = buff[i];
            buff[i] = buff[8 - i - 1];
            buff[8 - i - 1] = temp;
        }
    }
    sign = buff[0] & 0x80 ? -1 : 1;
    /* exponet in raw format*/
    exponent = ((buff[0] & 0x7F) << 4) | ((buff[1] & 0xF0) >> 4);

    /* read inthe mantissa. Top bit is 0.5, the successive bits half*/
    bitval = 0.5;
    maski = 1;
    mask = 0x08;
    for (i = 0; i < significandbits; i++)
    {
        if (buff[maski] & mask)
            fnorm += bitval;

        bitval /= 2.0;
        mask >>= 1;
        if (mask == 0)
        {
            mask = 0x80;
            maski++;
        }
    }
    /* handle zero specially */
    if (exponent == 0 && fnorm == 0)
        return 0.0;

    shift = exponent - ((1 << (expbits - 1)) - 1); /* exponent = shift + bias */
    /* nans have exp 1024 and non-zero mantissa */
    if (shift == 1024 && fnorm != 0)
        return sqrt(-1.0);
    /*infinity*/
    if (shift == 1024 && fnorm == 0)
    {

#ifdef INFINITY
        return sign == 1 ? INFINITY : -INFINITY;
#endif
        return  (sign * 1.0) / 0.0;
    }
    if (shift > -1023)
    {
        answer = ldexp(fnorm + 1.0, shift);
        return answer * sign;
    }
    else
    {
        /* denormalised numbers */
        if (fnorm == 0.0)
            return 0.0;
        shift = -1022;
        while (fnorm < 1.0)
        {
            fnorm *= 2;
            shift--;
        }
        answer = ldexp(fnorm, shift);
        return answer * sign;
    }
}

it's a lot, but it's just a snippet to cut and paste, and you never need to worry about binary floating point formats again. It simply reads an IEEE 754 double, regardless of host floating point format. There's a twin which writes

Malcolm McLean
  • 6,258
  • 1
  • 17
  • 18
0

Instead of reading into a char * buffer, read into a double * buffer. Casting to/from char * is allowed just for this purpose.

vector<double> buffer;
buffer.resize(n);
is.read(reinterpret_cast<char *>(&buffer[0]), n * sizeof(buffer[0]));

You'll need to read the non-binary data first so that the file pointer is located at the start of the binary data. This is defined as coming immediately after the newline character of the last field in the header.

The spec doesn't appear to mandate little-endian or big-endian format, it expects you to know based on the source of the file. If you're lucky the format will match the machine you're using to read the file and no conversion will be necessary. Otherwise you'll need to do a byte swap:

void ByteSwap(double * p)
{
    char * pc = reinterpret_cast<char *>(p);
    std::swap(pc[0], pc[7]);
    std::swap(pc[1], pc[6]);
    std::swap(pc[2], pc[5]);
    std::swap(pc[3], pc[4]);
}
Mark Ransom
  • 299,747
  • 42
  • 398
  • 622