2

I have an array of bytes and i want to determine if the contents of this array of bytes exists within another larger array as a continuous sequence. What is the simplest way to go about doing this?

Bastien Léonard
  • 60,478
  • 20
  • 78
  • 95
RCIX
  • 38,647
  • 50
  • 150
  • 207

3 Answers3

3

The naive approach is:

public static bool IsSubsetOf(byte[] set, byte[] subset) {
    for(int i = 0; i < set.Length && i + subset.Length <= set.Length; ++i)
        if (set.Skip(i).Take(subset.Length).SequenceEqual(subset))
            return true;
    return false;
}

For more efficient approaches, you might consider more advanced string matching algorithms like KMP.

Mehrdad Afshari
  • 414,610
  • 91
  • 852
  • 789
  • 1
    @Michal: It's O(n*m) which might be considered inefficient, but if you're worrying about Skip and Take, don't. This is as efficient as a couple for loops. As I said, if performance is an issue, you should consider a more advanced algorithm. – Mehrdad Afshari Jun 19 '09 at 23:37
  • @Mehrdad - yeah, but m*n can get quite big for binary data... I know that Skip() and Take() is done by deferred execution - so it's just n component of O(m*n) coming from SequenceEquals... – Michał Chaniewski Jun 19 '09 at 23:50
  • Performance will be a problem but for now i just want to get my code working. this ought to do nicely... – RCIX Jun 19 '09 at 23:59
  • 1
    `Skip()` can't actually do an indexed jump. It must call `MoveNext()` on the iterator `i` times. See http://msmvps.com/blogs/jon_skeet/archive/2011/01/26/reimplementing-linq-to-objects-part-40-optimization.aspx for why. But with the "if you are worried about perf... do something else" caveat, this answer is sound. – Merlyn Morgan-Graham Sep 08 '11 at 08:41
3

Try to adapt some string search algorithm. One of the fastest is Boyer-Moore . It's quite easy as well. For binary data, Knuth-Morris-Pratt algorithm might work very efficiently as well.

Michał Chaniewski
  • 4,646
  • 1
  • 18
  • 15
0

This, which is a 1/1 port of this answer: Searching for a sequence of Bytes in a Binary File with Java

Is a very efficient way of doing so:

public static class KmpSearch {

    public static int IndexOf(byte[] data, byte[] pattern) {
        int[] failure = ComputeFailure(pattern);

        int j = 0;
        if (data.Length == 0) return -1;

        for (int i = 0; i < data.Length; i++) {
            while (j > 0 && pattern[j] != data[i]) {
                j = failure[j - 1];
            }
            if (pattern[j] == data[i]) { j++; }
            if (j == pattern.Length) {
                return i - pattern.Length + 1;
            }
        }
        return -1;
    }


    private static int[] ComputeFailure(byte[] pattern) {
        int[] failure = new int[pattern.Length];

        int j = 0;
        for (int i = 1; i < pattern.Length; i++) {
            while (j > 0 && pattern[j] != pattern[i]) {
                j = failure[j - 1];
            }
            if (pattern[j] == pattern[i]) {
                j++;
            }
            failure[i] = j;
        }

        return failure;
    }
}
Community
  • 1
  • 1
Sam Saffron
  • 128,308
  • 78
  • 326
  • 506