0

I'm trying to scan for a virus signature in a binary file (without using strstr()), and printing if the signature was found or not. But the code doesn't work. I read the files outside of the function.

Unfortunately the code doesn't work, even when I'm trying to scan the virus signature itself.

int searchForSignature(FILE* file_to_search, FILE* virus) {
    int v_size = 0;
    int f_size = 0;


    fseek(virus, 0L, SEEK_END);
    v_size = ftell(virus);
    fseek(virus, 0, SEEK_SET);
    fseek(file_to_search, 0L, SEEK_END);
    f_size = ftell(file_to_search);
    fseek(file_to_search, 0, SEEK_SET);

    printf("VIRUS SIZE: %d FILE SIZE: %d\n", v_size, f_size);

    if (v_size > f_size)
    {
        printf("VIRUS NOT FOUND\n");
        return 1;
    }

    int counter = 0;  char ch = ' '; char ch2 = ' ';
    while ((ch = (char)fgetc(file_to_search)) != EOF)
    {
        printf("%d\n", counter);
        if (counter == v_size)
        {
            printf("VIRUS FOUND\n");
            return 0;
        }
        else
        {
            ch2 = (char)fgetc(virus);
            if (ch == ch2)
            {
                counter++;
            }
            else
            {
                counter = 0;
            }
        }
    }

    printf("VIRUS NOT FOUND\n");
    return 1;
}

how I read the files outside of the function:

FILE* virus_file = NULL;
virus_file = fopen("example file path", "rb");
if (virus_file == NULL)
{
    printf("Error opening file");
    return 1;
}

FILE* file_to_scan = NULL;
file_to_scan = fopen("example file path 2", "rb");
if (file_to_scan == NULL)
{
    printf("Error opening file");
    return 1;
}
Lee Taylor
  • 7,761
  • 16
  • 33
  • 49
MCR aaa
  • 13
  • 1
  • 1
    Can you be more precise as to what "doesn't work" about this code? – Scott Hunter May 13 '22 at 14:50
  • 2
    And how about a [mre]? – John Bollinger May 13 '22 at 14:51
  • Does this answer your question? [fgetc does not identify EOF](https://stackoverflow.com/questions/3977223/fgetc-does-not-identify-eof) – wovano May 13 '22 at 14:52
  • Note that `fgetc` returns an `int` because `EOF` is -1. When you cast the result to a `char`, then a _legit_ data byte of 0xFF will mimic [an early] EOF. Remove the `(char)` cast and use `int ch;` Casting the return value of `fgetc` to `char` actually makes the code run _slower_ on some/most CPUs – Craig Estey May 13 '22 at 16:44

1 Answers1

0

At least these problems:

Success too late

counter == v_size may be true right after counter++;. There may not exist another character in the file for the next loop's if (counter == v_size). Instead test counter == v_size right after incrementing.

No reset

If only a partial match is found, match testing should not resume on the next source file character, but back up to the beginning of the compare, then advance 1.

Consider:

search file: "aaac"
virus:       "aac"

Searching search file fails on "aa(a)..." (the 3rd 'a'), yet needs to resume on "a(a)" (the 2nd 'a').

Virus file needs to rewind() on failed match. I recommend to read the entire virus signature into allocated memory (e.g.: unsigned char *virus = malloc(v_size); rather than re-reading from a file.

Loss of information

int fgetc() returns 257 different values: 0-255 and EOF. Saving that result in a char loses information. Use int ch = fgetc(...), int ch2 = fgetc(...).

This can explain OP's "code doesnt working, even when Im trying to scan the virus signature itself." as one of the virus bytes is wrongly matches EOF when converted to char.

Casts are not needed to solve this.

Wrong type length

ftell(virus) returns a long. Saving the result in an int can lose information. Also code lacks error checks here.

Unneeded code

if (v_size > f_size) block is not needed.


Alternative

// Pseudo code
Read virus into an allocated buffer.
Read a chunk of test file into a test buffer at least 2x virus size.
Walk test buffer (1 char at a time) trying `memcmp()`. 
  Stop if match found,
  until remaining test buffer not full enough for a possible match.
Move remaining test buffer to start and re-fill as able.
If not able to re-fill at all, we are done - no match.

This is O(test_size * virus_size). A more sophisticated "walk" is O(test_size + virus_size).

wovano
  • 4,543
  • 5
  • 22
  • 49
chux - Reinstate Monica
  • 143,097
  • 13
  • 135
  • 256
  • I believe `v_size > f_size` is meant to be a shortcut - if the virus is larger than the file, the file can't contain it. It might not be strictly necessary, but avoids byte comparisons when it can only fail. – Dave Meehan May 13 '22 at 16:25
  • @Dave Meehan, Agreed as to why OP did it, yet is remains not useful. It costs a `fseek(file_to_search, 0L, SEEK_END);` which may be quite expensive. – chux - Reinstate Monica May 13 '22 at 16:29