0

I am parsing a binary file. My code (edited for brevity) looks like:

std::ifstream  ifs;

...

        ifs.open(argv[i], std::ios::binary);
        if (ifs.is_open()) {

...

std::string                  header;
std::string                  data;

std::copy_n(std::istream_iterator<char>(ifs), HEADER_SIZE, std::back_inserter(header));
while (!ifs.eof()) {
    if (!ifs.good()) {
        std::cerr << "Error reading message header." << std::endl;
        size = 0u;
    } else {
        data.clear();
        std::copy_n(std::istream_iterator<char>(ifs), size, std::back_inserter(data));
        if (!ifs.good()) {
            std::cerr << "Error reading message data." << std::endl;
            size = 0u;
        } else {

...

    header.clear();
    std::copy_n(std::istream_iterator<char>(ifs), DCP_LOG_HEADER_SIZE, std::back_inserter(header));
}

The binary data contains byte 0x20 as one of the bytes in a header part way through the file. The hexdump of the relevant part of the file looks like:

00000080  01 00 00 00 99 7a 2b 50  dd 00 04 05 00 00 00 00  |.....z+P........|
00000090  99 7c 20 50 dd 00 04 05  01 00 00 00 99 7c 21 50  |.| P.........|!P|

Adding debug to my code I see that the bytes are read as:

Header: 00 99 7a 2b 50 dd 00 04 
Data:   05 00 00 00 
Header: 00 99 7c 50 dd 00 04 05 
Error reading message data.

The hexdump lines up quite nicely. You can clearly see that 00 99 7c 20 50 dd 00 04 is read as 00 99 7c 50 dd 00 04 and then the subsequent byte 05 is read too.

Why is the 0x20 byte (space character) not read?

As a side question (maybe needing a separate stack overflow question), if I create a function scope variable for std::istream_iterator<char>(ifs) and use it to avoid the overhead of constructing such an object twice each time around the loop, I get some very odd behaviour. The first read is fine, but the second has a single null byte prepended to the read data. The third read gets two null bytes at which point the code fails. I guess that the nth read gets n-1 null bytes prepended to the read data. Why can't I re-use the object.

Also, if I use std::istreambuf_iterator<char>(ifs), then even without using it as a function scope variable/object, I get null bytes pre-pended to the data.

Frankly I am very disappointed in C++ when it comes to file IO. Trying to do things in a "proper" C++ way can lead to some very awkward code, and I have read several articles which show that, without some really awkward C++ code, reading files is just quicker using basic C functions instead of using C++.

AlastairG
  • 4,119
  • 5
  • 26
  • 41
  • 1
    Mandatory read: [Why is iostream::eof inside a loop condition (i.e. `while (!stream.eof())`) considered wrong?](https://stackoverflow.com/questions/5605125/why-is-iostreameof-inside-a-loop-condition-i-e-while-stream-eof-cons) – molbdnilo Oct 29 '21 at 12:19
  • 2
    From the [documentation](https://en.cppreference.com/w/cpp/iterator/istream_iterator#Notes): `"When reading characters, std::istream_iterator skips whitespace by default..."`. – G.M. Oct 29 '21 at 12:21
  • 2
    If you are dealing with binary data, you can use the read function instead to read in a blob of bytes, i.e, `data.resize(size); ifs.read(data.data(), size);` – NathanOliver Oct 29 '21 at 12:34
  • @G.M. Thanks. Could you put that as a proper answer so I can accept it, please. I have to ask, what bloody use is an iterator that skips bytes? White space matters! – AlastairG Nov 01 '21 at 08:41
  • @NathanOliver, that doesn't work for std:;string. I tried it. You can assign() the right number of null bytes - that works and is what I ended up doing. However I wasn't looking for a solution to use in my code with this question. I just wanted an explanation for this seemingly bizarre behaviour. – AlastairG Nov 01 '21 at 08:43
  • @molbdnilo I suggest you read my code more carefully. I ONLY check of EOF directly after a read. The mistake in the link you posted is that EOF is checked before reading the data. I.e. instead of doing: `while(!eof){...read...do stuff}` as per the link you posted, my code does: `read...while(!eof){...do_stuff...read more}`. This does mean I have a duplicate line of code to read the header though. – AlastairG Nov 01 '21 at 08:48

0 Answers0