-1

For some reason, when I open a file and read it byte by byte in Python and C and try to print the result, I get random characters/data mixed in.

For example, when I read the first 8 bytes of a PNG image, as in the following example:

/* Test file reading and see if there's random data */

#include <stdio.h>
#include <stdlib.h>
#include <malloc.h>

#define PNG_BYTES_TO_CHECK 8 

int
main(void)
{
    char fname[] = "../images/2.png";

    FILE *fp = fopen(fname, "rb");
    if (fp == NULL) abort();

    char *buffer = (char *)malloc(PNG_BYTES_TO_CHECK);

    if (fread(buffer, 1, PNG_BYTES_TO_CHECK, fp) != PNG_BYTES_TO_CHECK)
        abort();

    unsigned i;
    for (i = 0; i < PNG_BYTES_TO_CHECK; ++i) printf("%x ", buffer[i]);
    printf("\n");

    free(buffer); fclose(fp);

    return 1;
}

I get this garbage to stdout:

ffffff89 50 4e 47 d a 1a a

But when I open the file with a hex editor, the bytes are perfectly fine (it's a valid PNG signature):

enter image description here

Any ideas as to what may cause this ? I don't have an example for Python, but I recall a few days ago I was getting repetitive mumbo jumbo while working with files at the byte level and printing stuff as well.

bjd2385
  • 2,013
  • 4
  • 26
  • 47

1 Answers1

2

The png spec states that a png file should always start with the bytes137 80 78 71 13 10 26 10. The maximum value for a signed byte is 127, meaning that the first byte's value overflows and becomes -119 (if this is confusing, check out the way negative numbers are represented). You are then printing it as an unsigned hexadecimal integer. To do so, the signed byte is promoted to an integer. Again, because of the way negative numbers are represented, a 4-byte integer whose value is -119 has the following binary representation: 11111111111111111111111110001001. %x is the format specifier for an unsigned hexadecimal value. Because it thinks the value you are giving it is unsigned, it won't interpret that binary as if it were represented like a negative number. If you convert 11111111111111111111111110001001 to hex, you'll see that it is ffffff89.

tl;dr: there's nothing wrong with the file. You just forgot to make your bytes unsigned.

EKW
  • 2,059
  • 14
  • 24
  • Signed integers cannot overflow. It invokes undefined behaviour for arithmetic operations and the conversion of unsigned to signed is implementation defined, buit does not overflow, nor is it guaranteed to yield a specific result. And passing the wrong type to `printf` invokes UB, too. No use in researching why a spcific behaviour is shown. – too honest for this site Dec 31 '16 at 09:40