1

I've recently needed to convert mnist data-set to images and labels, it is binary and the structure is in the previous link, so i did a little research and as I'm fan of c++ ,I've read the I/O binary in c++,after that I've found this link in stack. That link works well but no code commenting and no explanation of algorithm so I've get confused and that raise some question in my mind which i need a professional c++ programmer to ask.

1-What is the algorithm to convert the data-set in c++ with help of ifstream?

I've realized to read a file as a binary with file.read and move to the next record, but in C , we define a struct and move it inside the file but i can't see any struct in c++ program for example to read this:

[offset] [type]          [value]          [description]
0000     32 bit integer  0x00000803(2051) magic number
0004     32 bit integer  60000            number of images
0008     32 bit integer  28               number of rows
0012     32 bit integer  28               number of columns
0016     unsigned byte   ??               pixel

How can we go to the specific offset for example 0004 and read for example 32 bit integer and put it to an integer variable.

2-What the function reverseInt is doing? (It is not obviously doing simple reversing an integer)

int ReverseInt (int i)
{
    unsigned char ch1, ch2, ch3, ch4;
    ch1 = i & 255;
    ch2 = (i >> 8) & 255;
    ch3 = (i >> 16) & 255;
    ch4 = (i >> 24) & 255;
    return((int) ch1 << 24) + ((int)ch2 << 16) + ((int)ch3 << 8) + ch4;
}

I've did a little debugging with cout and when it revised for example 270991360 it return 10000 , which i cannot find any relation, I understand it AND the number multiples with two with 255 but why?

PS :

1-I already have the MNIST converted images but i want to understand the algorithm.

2-I've already unzip the gz files so the file is pure binary.

frdf
  • 55
  • 1
  • 1
  • 6
  • 1. Linked C code shows how to read a file, but not how to create a data structure. That part you need to do yourself. 2. It converts between big-endian and little-endian integers [read more](https://en.wikipedia.org/wiki/Endianness). – n. m. could be an AI Mar 22 '17 at 07:44

1 Answers1

0

1-What is the algorithm to convert the data-set in c++ with help of ifstream?

This function read a file (t10k-images-idx3-ubyte.gz) as follow:

  • Read a magic number and adjust endianness
  • Read number of images and adjust endianness
  • Read number rows and adjust endianness
  • Read number of columns and adjust endianness
  • Read all the given images x rows x columns characters (but loose them).

The function use normal int and always switch endianness, that means it target a very specific architecture and is not portable.

How can we go to the specific offset for example 0004 and read for example 32 bit integer and put it to an integer variable.

ifstream provides a function to seek to a given position:

file.seekg( posInBytes, std::ios_base::beg);

At the given position, you could read the 32-bit integer:

int32_t val;
file.read ((char*)&val,sizeof(int32_t));

2- What the function reverseInt is doing?

This function reverse order of the bytes of an int value:

Considering an integer of 32bit like aaaaaaaabbbbbbbbccccccccdddddddd, it return the integer ddddddddccccccccbbbbbbbbaaaaaaaa.

This is useful for normalizing endianness, however, it is probably not very portable, as int might not be 32bit (but e.g. 16bit or 64bit)

Adrian Maire
  • 14,354
  • 9
  • 45
  • 85
  • Thank you, Very helpful answer, could you tell me how to save those lost images in the loop to vector or something? and How could we realize what Endian (little or big) the binary file used? – frdf Mar 22 '17 at 08:09
  • The specification of the file format probably tell you what endian is used. I guess Big endian considering the code. In the loop, well, just push that to any data structure you defined outside the loop. That really depends on what your software will do with it. Don't forget to accept/vote the answer if it was useful ;-) thanks. – Adrian Maire Mar 22 '17 at 08:31