10

What is the simplest way to find a byte[] inside another byte[]? i have a feeling i could do it with linq but i dont know how.

Note: I did a search with the [c#] and didnt find anything, i am surprised.

MPelletier
  • 16,256
  • 15
  • 86
  • 137
  • I think we need more information. Are you trying to find a subsequence of bytes within a byte array? Could you give an example? – Andrew Feb 01 '11 at 04:56
  • 3
    See, for example, the [Knuth-Morris-Pratt algorithm](http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm). – jason Feb 01 '11 at 04:58

6 Answers6

28

Here's a faster version of Ergwun's excellent answer:

static int SearchBytes( byte[] haystack, byte[] needle ) {
    var len = needle.Length;
    var limit = haystack.Length - len;
    for( var i = 0;  i <= limit;  i++ ) {
        var k = 0;
        for( ;  k < len;  k++ ) {
            if( needle[k] != haystack[i+k] ) break;
        }
        if( k == len ) return i;
    }
    return -1;
}

In a brief test with an 11MB haystack and 9 byte needle, this was about three times faster.

The optimizations are:

  • No function call each time through the outer loop.
  • Needle length and search limit are cached.
  • Redundant length test at the beginning of match() is removed.

Of course for long byte arrays you'd want to use something like a Boyer-Moore search, but for many purposes a simple algorithm like this is good enough, and it has the virtue of being short and easy to understand and verify.

Community
  • 1
  • 1
Michael Geary
  • 28,450
  • 9
  • 65
  • 75
  • Why would the decision of whether to use BM or not would depend on the size of the input rather than the alphabet set ? – Askar Rayapov Nov 02 '18 at 08:36
  • 1
    Good point! I was just speaking in general terms: a slow but simple and easy to understand algorithm can be good enough if either your data is small (by whatever measure) or if it really isn't that important how long it takes to run. For example, I had a case where I had to do a similar byte array search as part of a build process. With a simple algorithm like the one above, the search took about a second. Obviously this would be a deal-breaker in many situations, but this was part of a release build that took several minutes and was run only occasionally. So simple and slow was plenty good! – Michael Geary Nov 02 '18 at 18:44
  • How is this faster? Is it not doing the same naive search? – Goku Feb 25 '20 at 15:58
  • @Goku Yes, it is the same algorithm, but a faster implementation because of the three optimizations I listed in the answer. – Michael Geary Feb 25 '20 at 18:12
9

Here's a simple (naive?) way to do it:

static int search(byte[] haystack, byte[] needle)
{
    for (int i = 0; i <= haystack.Length - needle.Length; i++)
    {
        if (match(haystack, needle, i))
        {
            return i;
        }
    }
    return -1;
}

static bool match(byte[] haystack, byte[] needle, int start)
{
    if (needle.Length + start > haystack.Length)
    {
        return false;
    }
    else
    {
        for (int i = 0; i < needle.Length; i++)
        {
            if (needle[i] != haystack[i + start])
            {
                return false;
            }
        }
        return true;
    }
}
Ergwun
  • 12,579
  • 7
  • 56
  • 83
  • Perfect, just as i needed. To bad i cant do this with linq or something built in. Did you just write this now? or copy/pasted it from somewhere? –  Feb 01 '11 at 05:22
  • Note that depending on the input, this is potentially very slow. – jason Feb 01 '11 at 05:24
  • @acidzombie - Just wrote it. @Jason - yeah can be slow, but simple. – Ergwun Feb 01 '11 at 05:27
  • @jason: Why? I dont see anything 'slow' about it? –  Feb 01 '11 at 05:52
  • @acidzombie24: It's really easy to come up with examples where it's ridiculously slow. You can make it repeatedly start a long search through the match portion of the algorithm, and then barely fail, and then have to start over all again. – jason Feb 01 '11 at 05:57
  • @acidzombie24: Slow is a relative term. There are faster ways to do this (see Jason's link). Depending on how fast this is for your expected usage, that may not matter though. – Ergwun Feb 01 '11 at 06:09
  • @jason,@Ergwun: Ah, i see. Ok +1 for Jasons answer. Really i was looking for a simple solution and a way not to write it myself. (hoping i could learn something new about the .net library). So +1 for both and i'll leave it as it is. –  Feb 01 '11 at 06:36
  • Also the solution is fast enough for my use. I am searching <50mb of data and only need to test against one (user file selected) buffer. (i really only need to test against one file for a found/fail) –  Feb 01 '11 at 06:38
0

Try this one with using lambda expressions:

private bool CheckPatternInArray(byte[] array, byte[] pattern)
{
    int fidx = 0;
    int result = Array.FindIndex(array, 0, array.Length, (byte b) =>
            {
                fidx = (b == pattern[fidx]) ? fidx + 1 : 0;
                return (fidx == pattern.Length);
            });
    return (result >= pattern.Length - 1);
}

If you are after the fastest one, check solutions here.

Alex Klaus
  • 8,168
  • 8
  • 71
  • 87
0

Appeciating this is an old question, but since it's still absent from LINQ despite being a common scenario, I've added a LINQ extension method below based on Michael's answer. Written in the spirit of a string/byte[] IndexOf.

It also explicitly handles an empty needle set, whereas the previous solution was returning a match (index 0), this is now returned as missing (index -1).

public static class LinqExtensions
{
    public static int IndexOf(this IEnumerable<byte> haystack, IEnumerable<byte> needle)
    {
        var needleArray = needle as byte[] ?? needle.ToArray();
        var haystackArray = haystack as byte[] ?? haystack.ToArray();

        var needleLength = needleArray.Length;
        var haystackLengthLimit = haystackArray.Length - needleLength;

        if (needleLength > 0)
        {
            for (var i = 0; i <= haystackLengthLimit; i++)
            {
                var j = 0;
                for (; j < needleLength; j++)
                {
                    if (needleArray[j] != haystackArray[i + j])
                        break;
                }

                if (j == needleLength)
                    return i;
            }
        }

        return -1;
    }
}

Plus some tests to show it in action..

    [Test]
    [TestCase(new byte[] { 1, 2, 3}, new byte[] {1, 3}, -1)]
    [TestCase(new byte[] { 1, 2, 3}, new byte[] {}, -1)]
    [TestCase(new byte[] { 1, 2, 3}, new byte[] {1}, 0)]
    [TestCase(new byte[] { 1, 2, 3}, new byte[] {2, 3}, 1)]
    [TestCase(new byte[] { 1, 2, 3, 20, 30, 40}, new byte[] {20, 30, 40}, 3)]
    [TestCase(new byte[] { 1, 2}, new byte[] {1, 2, 3}, -1)]
    [TestCase(new byte[] { }, new byte[] {1, 2, 3}, -1)]
    [TestCase(new byte[] { }, new byte[] {}, -1)]
    public void TestIndexOf(byte[] haystack, byte[] needle, int expectedIndex)
    {
        Assert.That(haystack.IndexOf(needle), Is.EqualTo(expectedIndex));
    }
stoj
  • 1,116
  • 1
  • 14
  • 25
0
byte[] any = { 0xff, 0x14, 0x1f, 0x13, 0x12, 0x2f, 0x3f, 0x4f, 0x5f, 0x6f, 0x11, 0x22, 0x23 };
byte[] pattern = { 0x4f, 0x5f, 0x6f };
string anyHexString = BitConverter.ToString(any).Replace("-", "");
string patternHexString = BitConverter.ToString(pattern).Replace("-", "");
int findIndex = anyHexString.IndexOf(patternHexString) / 2;
Console.WriteLine(findIndex);

If you don't care about performance, you can use this method, which is almost the most concise and clear

Convert the byte array to hexString and lookup the

拉拉姬
  • 140
  • 1
  • 8
-2

you probably could have figured this yourself but sometimes I like to do the simple thing.

bool found = false;
int i = 0;
for(; i < byteArray.Length || found; i++)
{
  if(byteArray[i] == lookingFor)
  {
    found = true;
  }
}
Aaron Anodide
  • 16,906
  • 15
  • 62
  • 121
  • 2
    I think you misunderstood the question. Think of the question as finding a word in a string, but the word is a `byte[]` and the string is another `byte[]`. – jason Feb 01 '11 at 05:03
  • yeah i read it as byte in a byte array. my bad. if you have ascii, you could use ASCIIEncoding.ASCII.GetString to make a string from your byte[] – Aaron Anodide Feb 01 '11 at 05:06