1

I have two systems. One is Ubuntu 14.04 64bit on a Intel CPU, the other one is Ubuntu 14.04 for ARM on a CubieTruck.

The Intel system has a data file stored on a ext4 formatted HDD. The CubieTruck has the same file on a NTFS HDD, which is mounted with NTFS-3G.

I currently have a problem with pread() on those systems. I read a bunch of bytes from a file, and print out the first 64 byte from this chunk. Later, these bytes are used to calculate some hash using Shabal.

While the data printed on the CubieTruck matches exactly what I see on a Windows system when opening the file with a Hex-editor, the output on the 64bit Ubuntu is different. It looks like it is filled with "FFFFFF", but also different in general. Even more strange is, that while output on the CubieTruck always stays the same, it changes on the 64bit Ubuntu system after a while (I haven't seen a pattern when that happens, I just check from time to time).

But the most annoying thing is, that the x64 system seems to calculate correctly, while the ARM system is wrong.

I have no idea why pread delivers different results for the same file under those systems, but I hope someone can shed some light into it.

edit, the code:

int main(int argc, char **argv) {
    unsigned int readsize = 16384 * 32 * 2;
    char *cache = (char*) malloc(readsize);

    int fh = open("/home/user/somefile", O_RDONLY);

    if (fh < 0) {
        printf("can't open file");
        exit(-1);
    }

    int bytes = 0, b;

    do {
        b = pread(fh, &cache[bytes], readsize - bytes, bytes);
        bytes += b;
    } while(bytes < readsize && b > 0);

    int i = 0;
    for (i=0; i < 64; i++) {
        printf("%02X", cache[i]);
    }

    close(fh);
    free(cache);
    return 0;
}

both systems are opening the exact same file.

result on x64: FFFFFF94FFFFFFF16D25FFFFFFC0FFFFFFA3367D010BFFFFFFEF1E12FFFFFF841CFFFFFFBE4C26FFFFFF92FFFFFF80FFFFFF86FFFFFFA822FFFFFF8A26FFFFFF906CFFFFFFAD05FFFFFFE7FFFFFFB124FFFFFFA8FFFFFFF77B16FFFFFFEAFFFFFFACFFFFFF9DFFFFFF9EFFFFFF81FFFFFFC7FFFFFF92FFFFFFCDFFFFFFB0FFFFFFE86270FFFFFFF974FFFFFFA8420C45FFFFFFFC04FFFFFFF9103F2E3A47FFFFFF990F

result on ARM: 94F16D25C0A3367D010BEF1E12841CBE4C26928086A8228A26906CAD05E7B124A8F77B16EAAC9D9E81C792CDB0E86270F974A8420C45FC04F9103F2E3A47990F

You can see, on x64, the result is filled with "FFFFFF", and it appears, that this is somehow needed later on. But I don't get why it's different on my systems.

nim
  • 35
  • 4
  • What *exactly* they are read in to and then how *exactly* are they used (the type of questions **code** would answer nicely; you've told us about it, now *show it*). – WhozCraig Sep 21 '14 at 15:48
  • updated my question with code, and exact results. – nim Sep 21 '14 at 16:08

1 Answers1

2

One of the fun aspects of C is the amount of implementation-defined behaviour - in this case, whether char is signed or not.

The %x format specifier takes an unsigned int argument, so in the ARM case is the conversion is straightforward - char is unsigned so just gets zero-extended to unsigned int. However for x86 where it's signed, the converion can go one of two ways:

  • sign-extend the char to a signed int, then cast it to unsigned int
  • first cast to unsigned char, then zero-extend to unisgned int

It appears the char->int part of the conversion takes precedence over the signed->unsigned part* so you get the former (note how the bytes without the top bit set are unambiguous and print the same on both implementations). I imagine your calculation does a similar conversion somewhere expecting signedness, hence why it breaks on ARM.

In short, if you're dealing with char-sized values rather than characters, always specify signed char or unsigned char as appropriate, never bare char.

* I suppose I could dig out the standard to check if that's actually specified, but at this point it's merely a trivial detail

Community
  • 1
  • 1
Notlikethat
  • 20,095
  • 3
  • 40
  • 77
  • 2
    +1, the `char` to `int` happens in the caller when the argument is prepared for the call. The `int` to `unsigned` then happens inside `printf`, which can't drop leading bits, and so you see all the FFFF in the output. – Jens Gustedt Sep 21 '14 at 18:50
  • Thanks for the great explanation. It was 100% correct and I was able to verify it by changing `char` to `signed char` on the ARM plattform. Now both systems show the same output. Thanks again! – nim Sep 21 '14 at 21:39