6

Have

List<byte> lbyte 

Have

byte[] searchBytes

How can I search lbyte for not just a single byte but for the index of the searchBytes?
E.G.

Int32 index = lbyte.FirstIndexOf(searchBytes);

Here is the brute force I came up with.
Not the performance I am looking for.

public static Int32 ListIndexOfArray(List<byte> lb, byte[] sbs)
{
    if (sbs == null) return -1;
    if (sbs.Length == 0) return -1;
    if (sbs.Length > 8) return -1;
    if (sbs.Length == 1) return lb.FirstOrDefault(x => x == sbs[0]);
    Int32 sbsLen = sbs.Length;
    Int32 sbsCurMatch = 0;
    for (int i = 0; i < lb.Count; i++)
    {
        if (lb[i] == sbs[sbsCurMatch])
        {
            sbsCurMatch++;
            if (sbsCurMatch == sbsLen)
            {
                //int index = lb.FindIndex(e => sbs.All(f => f.Equals(e)));  // fails to find a match
                IndexOfArray = i - sbsLen + 1;
                return;
            }
        }
        else 
        {
            sbsCurMatch = 0;
        }
    }
    return -1;
}
paparazzo
  • 44,497
  • 23
  • 105
  • 176
  • @YairNevet a List of byte – paparazzo Apr 20 '13 at 00:16
  • @YairNevet another list of bytes, this is essentially orderedlist.containssequence there must be a good solution here as its essentially solved with string.contains but this is a more generic case – undefined Apr 20 '13 at 00:17
  • Try looking [here](http://www.lyquidity.com/devblog/?p=79) for an extension method to do this. – Icemanind Apr 20 '13 at 00:22
  • Are you sure your brute force method is correct? Looks like it's not checking the last character of the search string. So when looking for "frob", if only checks "fro". I think you want to increment `sbsCurMatch` *after* the length check. As it stands, it will fail on a one-character string. – Jim Mischel Apr 20 '13 at 13:52

3 Answers3

4

Brute force is always an option. Although slow in comparison to some other methods, in practice it's usually not too bad. It's easy to implement and quite acceptable if lbyte isn't huge and doesn't have pathological data.

It's the same concept as brute force string searching.

Jim Mischel
  • 131,090
  • 20
  • 188
  • 351
3

You may find Boyer-Moore algorithm useful here. Convert your list to an array and search. The algorithm code is taken from this post.

static int SimpleBoyerMooreSearch(byte[] haystack, byte[] needle)
{
    int[] lookup = new int[256];
    for (int i = 0; i < lookup.Length; i++) { lookup[i] = needle.Length; }

    for (int i = 0; i < needle.Length; i++)
    {
        lookup[needle[i]] = needle.Length - i - 1;
    }

    int index = needle.Length - 1;
    var lastByte = needle.Last();
    while (index < haystack.Length)
    {
        var checkByte = haystack[index];
        if (haystack[index] == lastByte)
        {
            bool found = true;
            for (int j = needle.Length - 2; j >= 0; j--)
            {
                if (haystack[index - needle.Length + j + 1] != needle[j])
                {
                    found = false;
                    break;
                }
            }

            if (found)
                return index - needle.Length + 1;
            else
                index++;
        }
        else
        {
            index += lookup[checkByte];
        }
    }
    return -1;
}

You can then search like this. If lbyte will remain constant after a certain time, you can just convert it to an array once and pass that.

//index is returned, or -1 if 'searchBytes' is not found
int startIndex = SimpleBoyerMooreSearch(lbyte.ToArray(), searchBytes);

Update based on comment. Here's the IList implementation which means that arrays and lists (and anything else that implements IList can be passed)

 static int SimpleBoyerMooreSearch(IList<byte> haystack, IList<byte> needle)
 {
    int[] lookup = new int[256];
    for (int i = 0; i < lookup.Length; i++) { lookup[i] = needle.Count; }

    for (int i = 0; i < needle.Count; i++)
    {
        lookup[needle[i]] = needle.Count - i - 1;
    }

    int index = needle.Count - 1;
    var lastByte = needle[index];
    while (index < haystack.Count)
    {
        var checkByte = haystack[index];
        if (haystack[index] == lastByte)
        {
            bool found = true;
            for (int j = needle.Count - 2; j >= 0; j--)
            {
                if (haystack[index - needle.Count + j + 1] != needle[j])
                {
                    found = false;
                    break;
                }
            }

            if (found)
                return index - needle.Count + 1;
            else
                index++;
        }
        else
        {
            index += lookup[checkByte];
        }
    }
    return -1;
}

Since arrays and lists implement IList, there's no conversion necessary when calling it in your case.

int startIndex = SimpleBoyerMooreSearch(lbyte, searchBytes);
Community
  • 1
  • 1
keyboardP
  • 68,824
  • 13
  • 156
  • 205
  • 1
    You may be able to switch out your byte arrays with [`IList`s](http://msdn.microsoft.com/en-us/library/system.collections.ilist.aspx) to make your code more generic – Scott Chamberlain Apr 20 '13 at 00:42
  • Unfotunalely the List changes with each call. I don't yet follow this but I will give it a try. If you implement the IList version please post it here. – paparazzo Apr 20 '13 at 01:19
  • @Blam - I've added the IList version. It's not fully generic (only accepts bytes) but it should work in your case as it handles lists and arrays of bytes. – keyboardP Apr 20 '13 at 01:33
  • I am too tired to follow it right now but I will test it out. – paparazzo Apr 20 '13 at 01:37
  • @Blam - This post may help explain the algorithm itself. http://stackoverflow.com/questions/6207819/boyer-moore-algorithm-understanding-and-example – keyboardP Apr 20 '13 at 01:38
  • @keyboardP As I read that explanation it is the same brute force I came up with but my code is nothing like your code. – paparazzo Apr 20 '13 at 02:31
  • 2
    @Blam: B-M is not a brute force algorithm. The gist of B-M is that you can skip up to *m* characters (bytes) when a mismatch occurs, your code always skips only 1 which makes it slower by an order of magnitude. Since your case is basically a substring search (i.e. order is important, values are not unique, isn't sorted) then B-M is the fastest solution. – Boris B. Apr 21 '13 at 17:21
  • @BorisB. I just don't follow. If I am searching the string 'caacattdog' for 'cat' how can I skip going from either direction and not miss a match? – paparazzo Apr 21 '13 at 20:16
  • @Blam - B-M scans backwards so if you're searching for `cat`, `t` is the first to be checked. Since `t` does not match with `a` (3rd character) you can skip the first 3 characters (length of search pattern). Now, `t` matches `t` (6th char), `a` matches `a` and `c` matches `c`. Or Working from the end of the string, `t` doesn't match `g`, so skip to `length - 3`. `t` matches `t` but `a` doesn't match `t` (6th char). However, since `t` exists in search string, you move down 1 so that the search string `t` aligns with the 6th char `t`. There you have your match. These skips improve performance. – keyboardP Apr 21 '13 at 20:52
  • @Blam - I'm not an expert on the B-M algorithm but that's how I understand it to work. Happy to be corrected if I'm wrong about it. – keyboardP Apr 21 '13 at 20:52
  • @OK I finally get it +1 Thanks – paparazzo Apr 22 '13 at 01:26
1

Another way you could do with lambda expression

int index = lbyte.FindIndex(e => searchBytes.All(i => i.Equals(e));
cat916
  • 1,363
  • 10
  • 18