1

I have a file in which I'm trying to look for this sequence of bytes: 0xFF, 0xD8, 0xFF, and 0xE0. For right now, let's assume I'm only looking for 0xFF. I made this program for testing:

#include<stdio.h>
#include<stdlib.h>
#include<string.h>

void analyzeFile(char* filename)
{
    FILE* filePtr = fopen(filename, "rb");

    int numImages = 0;

    while (!feof(filePtr))
    {
        char bytes;

        bytes = getc(filePtr);

        printf("%c", bytes);

        if ((bytes == 0xFF))
        {
            numImages++;
            printf("image found!\n");
        }
    }

    printf("%d\n", numImages);
}

This isn't working. When I call analyzeFile with parameter "test.txt", it prints the contents of the file out fine, but doesn't detect a single 0xFF byte:

contents of test.txt:

aÿØÿÿà1234

output:

aÿØÿÿà1234
0

for reference, 0xFF is equivalent to y-diaeresis, ÿ, according to ASCII.

human bean
  • 847
  • 3
  • 15
  • 1
    `0xFF` is not defined by ASCII. Use `hexdump` or some other hex viewer to see the actual bytes in numerical form – Eugene Sh. Nov 23 '21 at 15:09
  • 1
    Also change your type to *unsigned char*, otherwise your comparison won't work (see this funny experiment: https://ideone.com/Pk0rGg). That is because during the comparison and integer promotion the signed `char` value will get "sign-extended" to `0xFFFFFFFF` and compared to `0x000000FF` – Eugene Sh. Nov 23 '21 at 15:11
  • try `char bytes` -> `int bytes` – tstanisl Nov 23 '21 at 15:12
  • 1
    `while (!feof(filePtr))` is a bug. `feof` returns whether an earlier read found EOF. Just call `getc` and check whether it returns `EOF`. – ikegami Nov 23 '21 at 15:12
  • Why not pass it a JPEG (JFIF) file instead of assuming that a text file is encoded like you think it is? – Ian Abbott Nov 23 '21 at 15:12
  • I changed it to unsigned char, but now it's only detecting one of the y-diaereses, and outputting `aÿØÿÿà1234 image found! 1` I need it to be able to find multiple characters. – human bean Nov 23 '21 at 15:15
  • 1
    @humanbean Most likely it is detecting `EOF`. Change to `int` as was suggested above. And get rid of `while (!feof(filePtr))` - see [Why is “while ( !feof (file) )” always wrong?](https://stackoverflow.com/questions/5431941/why-is-while-feof-file-always-wrong) – Eugene Sh. Nov 23 '21 at 15:16
  • You're not looking for "y-diaereses", whatever that means, You're looking for FF bytes. and the one you do find is probably the `EOF` being misinterpreted as FF. Using an `unsigned char` is wrong. You need an `int`. Please refer to my earlier comment. – ikegami Nov 23 '21 at 15:16
  • I changed `unsigned char bytes` to `int bytes`; same output as when it was just `char bytes`. – human bean Nov 23 '21 at 15:18
  • 1
    @humanbean Meaning that your file does not contain 0xFF bytes. – Eugene Sh. Nov 23 '21 at 15:19
  • Your comments helped a lot, as well as the posted answer Thanks!! – human bean Nov 23 '21 at 15:29

1 Answers1

0

There are two problems with your code. For the first, see: Why is “while ( !feof (file) )” always wrong?

The second problem is that getc (or fgetc) returns an int, not a char. As it stands, your char value of 0xFF is sign-extended (to 0xFFFFFFFF, most likely) when it is promoted to an int for the if ((bytes == 0xFF)) comparison. So, use int for your bytes variable and change the loop to test the value that was read for the EOF signal:

void analyzeFile(char* filename)
{
    FILE* filePtr = fopen(filename, "rb");
    if (!filePtr) { // Add some error handling...
        printf("Could not open file!");
        return;
    }
    int numImages = 0;
    int bytes;
    while ( ( bytes = getc(filePtr) ) != EOF) {
        printf("%02X %c\n", (unsigned)bytes, bytes);

        if (bytes == 0xFF) { // Removed redundant extra parentheses
            numImages++;
            printf("image found!\n");
        }
    }
    fclose(filePtr); // Don't forget to close the file!
    printf("%d\n", numImages);
}
Adrian Mole
  • 49,934
  • 160
  • 51
  • 83
  • Unrelated to the question, but isn't the cast `(char)bytes` as a parameter to `printf` useless? It will get casted back to `int` by the default promotions. – Eugene Sh. Nov 23 '21 at 15:24
  • Thanks for the help! – human bean Nov 23 '21 at 15:29
  • The value of `(char)bytes` is implementation defined if `bytes > CHAR_MAX` (which can only happen if `char` is a signed type), so probably better not to cast it to `char`. Also, `%X` expects an `unsigned int`, so you *do* need a cast (to `unsigned int`) for that one. – Ian Abbott Nov 23 '21 at 17:11
  • @IanAbbott Fair comment - see edit. – Adrian Mole Nov 23 '21 at 17:15