1

i need to check if content in a binary file in in other binary file.

i've tried to copy both files content into a array of chars with fread and check them with strstr, but strstr is always returning NULL even if the content supposed to be found in the other file.

Any ideas?

Thanks.

Eran
  • 81
  • 1
  • 6
  • 1
    you can't use str*() functions on binary data - the binary data will naturally contain nulls, which will terminate the string operations. – Marc B May 15 '15 at 16:52
  • 1
    `strstr` works only if you provide null terminated strings. – R Sahu May 15 '15 at 16:52
  • You apparently fail to understand what `strstr()` does, it expects a `nul` terminated sequence of bytes, which yours can be or maybe not, so you can't use `strstr()` in this case. – Iharob Al Asimi May 15 '15 at 16:53
  • @user3121023 using `memcmp` will end up with `O(kn)` time complexity, where `k` and `n` are file sizes.. – Eugene Sh. May 15 '15 at 17:00

2 Answers2

2

Since the strstr function won't work here for an arbitrary binary data (it is working only for strings with \0. termination), I can see three approaches here:
1) Naive approach: iterate over one array of bytes, and use memcmp with the other array starting at different positions each time. Easy, but consumes O(k*n) time (k, n - sizes of the data).
2) Using the KMP algorithm. Requires some work on understanding and coding, but giving the best time complexity O(k+n).
3) If the performance is not important, and you don't want to mess with ANY somewhat non-trivial algorithms:
-- Convert your binary datas to strings, representing each byte with it's two digits HEX value.
-- Use strstr.

Update: After a little thinking about the third approach, there might be a case when it won't work right. Consider that you want to find the data represented by AA AA inside 1A AA A1. It shouldn't be found, since it is not there. But, if you represent the data as concatenated characters without delimiters, it will be like find AAAA in 1AAAA1, which will succeed. So adding some delimiter would be a good idea here.

Eugene Sh.
  • 17,802
  • 8
  • 40
  • 61
1

Do it yourself (notify me if there's a bug):

/* Returns location of substring in string. If not found, return -1.
 * ssize_t is defined by POSIX. */
ssize_t bin_strstr(void* data, size_t len, void* subdata, size_t sublen) {
    len -= sublen;
    for ( ; len >= 1; --len)
        if (memcmp(data + len, subdata, sublen) == 0)
            return len;
    return memcmp(data, subdata, sublen) ? 0 : -1;
}
cadaniluk
  • 15,027
  • 2
  • 39
  • 67