0

Background: This question is a follow up of this one.
The given answer suggesting to access the data through unsigned char * instead of char* worked successfully.

Main question: But how can we do if we have no choice ? (i.e. if char* imposed by a function prototype).


Context:

Let's assume that we have written an int array in binary format into a file.
It may look as (without errors checking):

const std::string bin_file("binary_file.bin");

const std::size_t len(10);
int test_data[len] {-4000, -3000, -2000, -1000, 0, 1000, 2000, 3000, 4000, 5000};

std::ofstream ofs(bin_file, std::ios::trunc | std::ios::binary);
for(std::size_t i = 0; i < len; ++i)
{
    ofs.write(reinterpret_cast<char*>(&test_data[i]), sizeof test_data[i]);
}
ofs.close();

Now I want to open the file, read it and print the previously written data one by one.

The opening is performed as follows (without errors checking):

std::ifstream ifs(bin_file, std::ios::binary); // open in binary mode

// get the length
ifs.seekg(0, ifs.end);
std::size_t byte_size = static_cast<std::size_t>(ifs.tellg());
ifs.seekg(0, ifs.beg);

At this point, byte_size == len*sizeof(int).


Possible solutions:

I know that I can do it either by:

int val;
for(std::size_t i = 0; i < len; ++i)
{
    ifs.read(reinterpret_cast<char*>(&val), sizeof val);
    std::cout << val << '\n';
}

or by:

int vals[len];
ifs.read(reinterpret_cast<char*>(vals), static_cast<std::streamsize>(byte_size));

for(std::size_t i = 0; i < len; ++i)
    std::cout << vals[i] << '\n';

Both of these solutions work fine but none of them are the purpose of this question.


Problem description:

I consider here the case where I want to get the full binary file contents into a char* and handle it afterwards.
I cannot use an unsigned char* since std::ifstream::read() is expecting a char*.

I tried:

char * buff = new char[byte_size];
ifs.read(buff, static_cast<std::streamsize>(byte_size));

int val = 0;
for(std::size_t i = 0; i < len; ++i)
{
    // Get the value via std::memcpy works fine
    //std::memcpy(&val, &buff[i*sizeof val], sizeof val);

    // Get the value via bit-wise shifts fails (guess: signedness issues)
    for(std::size_t j = 0; j < sizeof val; ++j)
        val |= reinterpret_cast<unsigned char *>(buff)[i*sizeof val + j] << CHAR_BIT*j; // For little-endian

    std::cout << val << '\n';
}

delete[] buff;

ifs.close();

With std::memcpy to copy the 4 bytes into the int, I got the expected results (the printed vals are the same values than the original ones).

With bit-wise shifting, even with reinterpret_cast<unsigned char*>ing the buffer, I got trash values resulting in failing to get back the original int value (the printed vals are "garbage" values: not the same values than the original ones).

My question is: What does std::memcpy to be able to get the right values back from a char* instead of an unsigned char* while it is not possible with my bit-wise shifting ?
And how could I solve it without using std::memcpy (for general interest purposes) ? I could not figure it out.

Fareanor
  • 5,900
  • 2
  • 11
  • 37
  • Q: on your arch, what does the following program print? https://coliru.stacked-crooked.com/a/e855aab8de264457 – YSC Oct 17 '19 at 14:46
  • @YSC It prints exactly the same output as yours on coliru: `-2122285186`. – Fareanor Oct 17 '19 at 14:49
  • Basically this is a big unreadable hack. Only hackers write stuff like that, computer scientists look at this and are ashamed. I'm one of them. – fonZ Oct 18 '19 at 13:20
  • There is no hack here. One can want to load all data once and for all and handle them afterwards. It allows to avoid to keep files open during the whole lifetime of the application. I bet any decent software uses the same philosophy. Computer scientists would easily understand it, if it is not your case, you can ask for clarification (I would be glad to provide it) and you can keep your condescension for yourself :) (no offense) – Fareanor Nov 04 '19 at 14:45

1 Answers1

0

Ok, this was a really stupid error, shame on me.

Actually, I forgot to reset val to zero before each next iteration...

The problem was not related to the bit-wise shifting, and the reinterpret_cast<unsigned char *> worked successfully.

The corrected version should be:

char * buff = new char[byte_size];
ifs.read(buff, static_cast<std::streamsize>(byte_size));

int val = 0;
for(std::size_t i = 0; i < len; ++i)
{
    for(std::size_t j = 0; j < sizeof val; ++j)
        val |= reinterpret_cast<unsigned char *>(buff)[i*sizeof val + j] << CHAR_BIT*j; // For little-endian

    std::cout << val << '\n';
    val = 0; // Reset the val
}

delete[] buff;

ifs.close();

For those who don't like casting, we can replace it with a mask as follows:

char * buff = new char[byte_size];
ifs.read(buff, static_cast<std::streamsize>(byte_size));

int val = 0;
for(std::size_t i = 0; i < len; ++i)
{
    int mask = 0x000000FF;
    for(std::size_t j = 0; j < sizeof val; ++j)
    {
        val |= (buff[i*sizeof val + j] << CHAR_BIT*j) & mask; // For little-endian
        mask = mask << CHAR_BIT;
    }

    std::cout << val << '\n';
    val = 0; // Reset the val
}

delete[] buff;

ifs.close();

Perfect example when the issue comes from between the keyboard and the chair :)

Fareanor
  • 5,900
  • 2
  • 11
  • 37