0

Does anyone know how to read in a file with raw encoding? So stumped.... I am trying to read in floats or doubles (I think). I have been stuck on this for a few weeks. Thank you!

File that I am trying to read from: http://www.sci.utah.edu/~gk/DTI-data/gk2/gk2-rcc-mask.raw

Description of raw encoding: hello://teem.sourceforge.net/nrrd/format.html#encoding (change hello to http to go to page) - "raw" - The data appears on disk exactly the same as in memory, in terms of byte values and byte ordering. Produced by write() and fwrite(), suitable for read() or fread().

Info of file: http://www.sci.utah.edu/~gk/DTI-data/gk2/gk2-rcc-mask.nhdr - I think the only things that matter here are the big endian (still trying to understand what that means from google) and raw encoding.

My current approach, uncertain if it's correct:

 //Function ripped off from example of c++ ifstream::read reference page

void scantensor(string filename){
    ifstream tdata(filename, ifstream::binary); // not sure if I should put ifstream::binary here

    // other things I tried
    // ifstream tdata(filename)  ifstream tdata(filename, ios::in)

    if(tdata){
            tdata.seekg(0, tdata.end);
            int length = tdata.tellg();
            tdata.seekg(0, tdata.beg);

            char* buffer = new char[length];

            tdata.read(buffer, length);

            tdata.close();

            double* d;
            d = (double*) buffer;

    } else cerr << "failed" << endl;
}

/*  P.S. I attempted to print the first 100 elements of the array.

    Then I print 100 other elements at some arbitrary array indices (i.e. 9,900 - 10,000).  I actually kept increasing the number of 0's until I ran out of bound at 100,000,000 (I don't think that's how it works lol but I was just playing around to see what happens)

    Here's the part that makes me suspicious: so the ifstream different has different constructors like the ones I tried above.

    the first 100 values are always the same.

    if I use ifstream::binary, then I get some values for the 100 arbitrary printing
    if I use the other two options, then I get -6.27744e+066 for all 100 of them

    So for now I am going to assume that ifstream::binary is the correct one.  The thing is, I am not sure if the file I provided is how binary files actually look like.  I am also unsure if these are the actual numbers that I am supposed to read in or just casting gone wrong.  I do realize that my casting from char* to double* can be unsafe, and I got that from one of the threads.

*/

I really appreciate it!

Edit 1: Right now the data being read in using the above method is apparently "incorrect" since in paraview the values are:

Dxx,Dxy,Dxz,Dyy,Dyz,Dzz
[0, 1], [-15.4006, 13.2248], [-5.32436, 5.39517], [-5.32915, 5.96026], [-17.87, 19.0954], [-6.02961, 5.24771], [-13.9861, 14.0524]

It's a 3 x 3 symmetric matrix, so 7 distinct values, 7 ranges of values.

The floats that I am currently parsing from the file right now are very large (i.e. -4.68855e-229, -1.32351e+120).

Perhaps somebody knows how to extract the floats from Paraview?

1 Answers1

0

Since you want to work with doubles, I recommend to read the data from file as buffer of doubles:

const long machineMemory = 0x40000000; // 1 GB

FILE* file = fopen("c:\\data.bin", "rb");

if (file)
{
    int size = machineMemory / sizeof(double);

    if (size > 0)
    {
      double* data = new double[size];

      int read(0);
      while (read = fread(data, sizeof(double), size, file))
      {
         // Process data here (read = number of doubles)
      }

      delete [] data;
   }

   fclose(file);
}
Evgeny Sobolev
  • 515
  • 4
  • 13
  • I tried it and it produced the same result as using ifstream with ifstream::binary. Thanks. Still need more answers just to verify, but as of now it seems correct. – user3298879 Feb 26 '14 at 21:37
  • The size of char is 1 byte. The size of double is 8 bytes. This really depends on the operating system but you can easily check this by calling sizeof(char) and sizeof(double). So, when you deal with double you need to consider two things: 1) The byte ordering 2) The bits ordering within a byte (big-endian or little-endian). If you provide more specific questions, I will be able to show you specific examples. – Evgeny Sobolev Feb 26 '14 at 22:18
  • So I asked my professor and it seems that it's incorrect. Since the data range is very large as compared to what is being produced in Paraview. And I am not sure how specific my question becomes since I am not very used to this type of file. – user3298879 Feb 26 '14 at 22:34
  • If the size of file is very large then yes, the proposed solution will cause OutOfMemory exception and the program will crash. Let's say the file is 1TB and your computer has 4 GB of memory. Obviously 1TB will not fit into 4GB. To resolve this, we need to read a chunk of data, process it and then read another chuck until we reach the end of the file. – Evgeny Sobolev Feb 26 '14 at 22:43
  • I modified the answer. Now it will work for file of any size. You just need to determine the available memory on the machine and assign it to machineMemory. – Evgeny Sobolev Feb 26 '14 at 22:50
  • By too large I meant the values being read in. (i.e. -4.68855e-229, -1.32351e+120). The float range when I fed it into Paraview is approximately [-20,20] for all 7 values of the 3x3 symmetric matrix. – user3298879 Feb 26 '14 at 23:16