6

I'm trying to read bytes from binary file but to no success. I've tried many solutions, but I get no get result. Struct of file:

[offset] [type]          [value]          [description] 
0000     32 bit integer  0x00000803(2051) magic number 
0004     32 bit integer  60000            number of images 
0008     32 bit integer  28               number of rows 
0012     32 bit integer  28               number of columns 
0016     unsigned byte   ??               pixel 
0017     unsigned byte   ??               pixel 
........ 
xxxx     unsigned byte   ??               pixel

How I tried (doesn't work):

auto myfile = fopen("t10k-images.idx3-ubyte", "r");
char buf[30];
auto x = fread(buf, 1, sizeof(int), myfile);
David G
  • 94,763
  • 41
  • 167
  • 253
SevenDays
  • 3,718
  • 10
  • 44
  • 71
  • 1
    The part where it said "MSB first" was kinda important. – Nicol Bolas Oct 13 '12 at 19:50
  • What does "no success" mean? I believe this should read sizeof(int) bytes into the buffer. You should check x after the read to make sure it equals x == sizeof(int). Try printing out the buffer as hex to see if it had read it correctly. – Chris Mansley Oct 13 '12 at 19:51
  • 1
    I'm guessing the issue is endianess. If the `int`s on disk are big endian, and the system is little endian, then the numbers won't match up. – Geoff Montee Oct 13 '12 at 20:05
  • 1
    Ironically, I got here after failing to read MNIST handwritten digit database just like you. – Nahiyan May 12 '18 at 12:26

4 Answers4

3

Read the bytes as unsigned char:

ifstream if;

if.open("filename", ios::binary);

if (if.fail())
{
    //error
}

vector<unsigned char> bytes;

while (!if.eof())
{
    unsigned char byte;

    if >> byte;

    if (if.fail())
    {
        //error
        break;
    }

    bytes.push_back(byte);
}

if.close();

Then to turn multiple bytes into a 32-bit integer for example:

uint32_t number;

number = ((static_cast<uint32_t>(byte3) << 24)
    | (static_cast<uint32_t>(byte2) << 16) 
    | (static_cast<uint32_t>(byte1) << 8) 
    | (static_cast<uint32_t>(byte0)));

This should cover endian issues. It doesn't matter if int shows up as B0B1B2B3 or B3B2B1B0 on the system, since the conversion is handled by bit shifts. The code doesn't assume any particular order in memory.

Geoff Montee
  • 2,587
  • 13
  • 14
  • It might be the order that you are assigning the bytes. I don't set `byte0`, `byte1`, etc. here for you. That's something that you have to do. – Geoff Montee Oct 13 '12 at 20:19
  • Yes, I've replaced bytes in order and it works. Sorry, I can not mark both answers as correct. – SevenDays Oct 13 '12 at 20:22
2

The C++ stream library function read() can be used for binary file I/O. Given the code example from the link, I would start like this:

std::ifstream myfile("t10k-images.idx3-ubyte", std::ios::binary);
std::uint32_t magic, numim, numro, numco;

myfile.read(reinterpret_cast<char*>(&magic), 4);
myfile.read(reinterpret_cast<char*>(&numim), 4);
myfile.read(reinterpret_cast<char*>(&numro), 4);
myfile.read(reinterpret_cast<char*>(&numco), 4);

// Changing byte order if necessary
//endswap(&magic);
//endswap(&numim);
//endswap(&numro);
//endswap(&numco);

if (myfile) {
    std::cout << "Magic = "  << magic << std::endl
              << "Images = " << numim << std::endl
              << "Rows = "   << numro << std::endl
              << "Cols = "   << numco << std::endl;
}

If the byte order (Endianness) should be reversed you could write a simple reverse function like this one: endswap()

Community
  • 1
  • 1
Christian Ammer
  • 7,464
  • 6
  • 51
  • 108
  • I get something like 50855936, 270991360, 469762048, 469762048. So this method doesn't work. – SevenDays Oct 13 '12 at 20:45
  • @wsevendays: This is the same issue as with the answer from Geoff_Montee, here you also got 50855936 (other byte order). Try the `endswap` function given in the link! – Christian Ammer Oct 13 '12 at 21:32
1

This is how you read an uint32_t from a file:

auto f = fopen("", "rb"); // not the b, for binary files you need to specify 'b'

std::uint32_t magic = 0;
fread (&magic, sizeof(std::uint32_t), 1, f);

Hope this helps.

mauve
  • 1,976
  • 12
  • 18
1

Knowing the endianness of your file layout whence reading multi-byte numerics is important. Assuming big-endian is always the written format, and assuming the value is indeed a 32bit unsigned value:

uint32_t magic = 0;
unsigned char[4] bytes;
if (1 == fread(bytes, sizeof(bytes), 1, f))
{
   magic = (uint32_t)((bytes[0] << 24) | 
                      (bytes[1] << 16) | 
                      (bytes[2] << 8) | 
                      bytes[3]);
}

Note: this will work regardless of whether the reader (your program) is little endian or big-endian. I'm sure I missed at least one cast in there, but hopefully you get the point. The only safe, and portable way of reading multi-byte numerics is to (a) know the endianness they were written with, and (b) read-and-assemble them byte by byte.

WhozCraig
  • 65,258
  • 11
  • 75
  • 141
  • I cast each byte to `unit32_t` before shifting each byte in my answer. Not sure if the compiler would automatically promote them for each shift. – Geoff Montee Oct 13 '12 at 20:08
  • 1
    You're not the only one. The language guys would know more than I, but I usually do as you did (each value promoted before the shift). Makes for a lot of typing, but works. I've seen both. See [this example](http://stackoverflow.com/questions/12765488/casting-a-char-array-to-an-integer/12765536#12765536) for a related conversion. (and i up-voted your answer, as I agree with it). – WhozCraig Oct 13 '12 at 20:11
  • @WhozCraig your method works! But as shown above "magic" is 0. Now I get "2051" which is the result I need. – SevenDays Oct 13 '12 at 20:12
  • @wsevendays so should Geoff's – WhozCraig Oct 13 '12 at 20:13
  • @wsevendays magic is *initialized* to zero above, *before* the read. get into that habit, btw. It will serve you well later. – WhozCraig Oct 13 '12 at 20:22