2

I am trying to read a binary file and add a sequence of four little-endian bytes into an int, but for some reason one of the bytes is being read with some kind of error. The program is a save file editor, and the error only happens after I edit one of the files playtime (seconds*60) to some big number (I used 7075920 for the example below).

Here's the code of the reading part:

char buffer[4];
std::ifstream saveFile (ui->directoryLine->text().toStdString(), std::ios::binary | std::ios::in);

saveFile.seekg(saveSlot*2416+44); //Read playtime and convert
saveFile.read(buffer, 4);
playtime = (buffer[0] | (buffer[1]<<8) | (buffer[2]<<16) | (buffer[3]<<24))/60;
ui->hSpinBox->setValue(playtime/3600);
ui->mSpinBox->setValue(playtime/60 % 60);
ui->sSpinBox->setValue(playtime % 60);

When I write the playtime as 7075920 on the file (hex 6BF850) and I view the file using a hex editor, the bytes are set correctly to 50 F8 6B 00 (little-endian)

But using printf with %X to print the buffer after reading the modified file returns this: 50 FFFFFFF8 6B 0, and playtime is then calculated as -32

I don't understand how the program is working with unmodified small int files, but returning broken bytes for modified ones if they have no errors and the diff tool of the hex editor shows that the only differences between the modified file and the original one (that works) are on those four modified bytes.

  • 4
    You have run afoul of [Sign extension](https://en.wikipedia.org/wiki/Sign_extension). F8 is a negative number. Use an unsigned type to avoid this. – user4581301 Jul 12 '18 at 17:50
  • 1
    Another contributor: Everything fed into a variable argument function like `printf` undergoes promotions to limit the number of different -but-similar variable types than need to be handled. a signed `char` printed as an integer becomes an `int` if I'm remembering correctly. – user4581301 Jul 12 '18 at 17:56
  • 1
    Just change the type of `buffer` from array of `char` to array of `unsigned char`. The signedness of `char` is unspecified, but you'd get this behavior if it's signed. – Pete Becker Jul 12 '18 at 18:26
  • Too too damn long to find this. My google foo is weak. Maybe I need some sleep. [Convert four bytes to Integer using C++](https://stackoverflow.com/questions/34943835/convert-four-bytes-to-integer-using-c) – user4581301 Jul 12 '18 at 18:31

2 Answers2

2

An alternate method to extract an integer from a reverse byteorder byte stream (which I think you're stating you have) is to interpret the buffer as an integer pointer, e.g.

int32_t value = *((int32_t*)buffer);

This treats the region of memory occupied by the buffer as the storage space for an integer, then the dereference reads it as such.

A safer variant takes advantage of this behavior being built into unions:

#include <stdio.h>
#include <stdint.h>

typedef union long_buffer_convert_u {
    //byte array accessor
    unsigned char buffer[4];
    //int32_t accessor
    int32_t value;
} long_buffer_convert_t;


int main()
{
    long_buffer_convert_t convert;
    convert.buffer[0] = 0x50;
    convert.buffer[1] = 0xF8;
    convert.buffer[2] = 0x6B;
    convert.buffer[3] = 0x00;
    int32_t value = convert.value;
    printf("value: %d %X", value, value);

    return 0;
}

Since I think its important to state, the union behavior is well defined in C, but undefined within the C++ standard. Using this for exclusively numeric types should work under any compiler you care to try, but be sure to test it first.

csunday95
  • 1,279
  • 10
  • 18
  • And then after reading `value` in this manner, you can swap its endian afterwards to match the host system, if needed. – Remy Lebeau Jul 12 '18 at 18:06
  • This answer invokes undefined behaviour by way of the strict aliasing rule. There are endian and integer size considerations here that are not being addressed. Use caution when doing this. – user4581301 Jul 12 '18 at 18:08
  • What kind of problem can this create? Using this fixed my problem and now the playtime is being read correctly. –  Jul 12 '18 at 18:09
  • I've switched the reinterpret cast type to long in order to avoid issues on 16-bit int systems; long is guaranteed to be 32-bits by the c standard – csunday95 Jul 12 '18 at 18:11
  • Problem 1: Strict Aliassing is broken. The commons solution to this is to use a `memcpy` rather than a cast. By the time the optimizer is done, there should be no performance difference. Problem 2: `long` may not be 32 bit. I recommend the appropriate [Fixed Width Integer](https://en.cppreference.com/w/cpp/types/integer) type to get around this problem. Problem 3: The native byte order may not match the order stored. This may or may not be a problem depending on just how portable you want the output file to be. – user4581301 Jul 12 '18 at 18:15
  • @csunday95 long is guaranteed by the C and C++ standards to be *no less than* 32 bits. The upper limit is not weakly bound, limited by being no longer than `long long`, which currently has no defined maximum. – user4581301 Jul 12 '18 at 18:17
  • 1
    The `union` approach is flat-out not legal in C++. It is undefined behaviour to read any `union` member other than the one most recently set. In practice this falls into the category of "Probably works, no guarantees." – user4581301 Jul 12 '18 at 18:20
  • virtually any compiler you use will have defined behavior for the union method, but yes, the c++ standard treats this method as "undefined" see [this question](https://stackoverflow.com/questions/11373203/accessing-inactive-union-member-and-undefined-behavior) – csunday95 Jul 12 '18 at 18:29
1

As user4581301 pointed out, this is happening due to Sign Extension (link provided by mentioned user).

To get around this you could use a bitmask to extract the portion you care about

playtime = (buffer[0] | ((buffer[1]<<8)& 0xff00) | ((buffer[2]<<16)& 0xff0000) | ((buffer[3]<<24)& 0xff000000))/60;
Jacob Boertjes
  • 963
  • 5
  • 20
  • 1
    That fixed the problem for the number I was using, but using an even bigger number made the problem return. Using csunday95's answer solved the problem completely. Thanks anyway ;) –  Jul 12 '18 at 18:06