I've recently needed to convert mnist data-set to images and labels, it is binary and the structure is in the previous link, so i did a little research and as I'm fan of c++ ,I've read the I/O binary in c++,after that I've found this link in stack. That link works well but no code commenting and no explanation of algorithm so I've get confused and that raise some question in my mind which i need a professional c++ programmer to ask.
1-What is the algorithm to convert the data-set in c++ with help of ifstream?
I've realized to read a file as a binary with file.read
and move to the next record, but in C , we define a struct and move it inside the file but i can't see any struct in c++ program for example to read this:
[offset] [type] [value] [description]
0000 32 bit integer 0x00000803(2051) magic number
0004 32 bit integer 60000 number of images
0008 32 bit integer 28 number of rows
0012 32 bit integer 28 number of columns
0016 unsigned byte ?? pixel
How can we go to the specific offset for example 0004
and read for example 32 bit integer
and put it to an integer variable.
2-What the function reverseInt is doing? (It is not obviously doing simple reversing an integer)
int ReverseInt (int i)
{
unsigned char ch1, ch2, ch3, ch4;
ch1 = i & 255;
ch2 = (i >> 8) & 255;
ch3 = (i >> 16) & 255;
ch4 = (i >> 24) & 255;
return((int) ch1 << 24) + ((int)ch2 << 16) + ((int)ch3 << 8) + ch4;
}
I've did a little debugging with cout and when it revised for example 270991360
it return 10000
, which i cannot find any relation, I understand it AND the number multiples with two with 255 but why?
PS :
1-I already have the MNIST converted images but i want to understand the algorithm.
2-I've already unzip the gz files so the file is pure binary.