7

Given two arrays as parameters (x and y) and find the starting index where the first occurrence of y in x. I am wondering what the simplest or the fastest implementation would be.

Example:

when x = {1,2,4,2,3,4,5,6}
     y =       {2,3}
result
     starting index should be 3

Update: Since my code is wrong I removed it from the question.

nawfal
  • 70,104
  • 56
  • 326
  • 368
Jeff
  • 13,079
  • 23
  • 71
  • 102
  • is your code trying to find the first occurrence/starting index of the sub-array? If thats so wouldn't your second example in your Results box, 3 first occurs at 0? not 2? – Anthony Forloney Nov 22 '09 at 23:52

9 Answers9

8

Simplest to write?

    return (from i in Enumerable.Range(0, 1 + x.Length - y.Length)
            where x.Skip(i).Take(y.Length).SequenceEqual(y)
            select (int?)i).FirstOrDefault().GetValueOrDefault(-1);

Not quite as efficient, of course... a bit more like it:

private static bool IsSubArrayEqual(int[] x, int[] y, int start) {
    for (int i = 0; i < y.Length; i++) {
        if (x[start++] != y[i]) return false;
    }
    return true;
}
public static int StartingIndex(this int[] x, int[] y) {
    int max = 1 + x.Length - y.Length;
    for(int i = 0 ; i < max ; i++) {
        if(IsSubArrayEqual(x,y,i)) return i;
    }
    return -1;
}
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • Marc, can you explain the `max` variable? why can't we use the length of the source array (`x`)? – Yair Nevet Sep 20 '15 at 08:18
  • @Yair if the source is 20 long, and were looking for a sub-array of length 5, then there is no point looking at arrays starting at index (0-based) 16, 17, 18 or 19: we know it can't possibly be a match, as there aren't enough elements left. – Marc Gravell Sep 20 '15 at 10:30
  • Because looking at the 15th index onward will be satisfied (x[15++]).. if I understand correctly – Yair Nevet Sep 20 '15 at 16:00
  • @Yair what does 15++ mean? Either way: no, it can't be a sub-array match if there aren't enough elements left – Marc Gravell Sep 20 '15 at 18:19
  • I love your Linq solution! – dmihailescu Jan 11 '16 at 23:28
7

Here is a simple (yet fairly efficient) implementation that finds all occurances of the array, not just the first one:

static class ArrayExtensions {

  public static IEnumerable<int> StartingIndex(this int[] x, int[] y) {
    IEnumerable<int> index = Enumerable.Range(0, x.Length - y.Length + 1);
    for (int i = 0; i < y.Length; i++) {
      index = index.Where(n => x[n + i] == y[i]).ToArray();
    }
    return index;
  }

}

Example:

int[] x = { 1, 2, 3, 4, 1, 2, 3, 4, 1, 2, 3, 4 };
int[] y = { 2, 3 };
foreach (int i in x.StartingIndex(y)) {
  Console.WriteLine(i);
}

Output:

1
5
9

The method first loops through the x array to find all occurances of the first item in the y array, and place the index of those in the index array. Then it goes on to reduce the matches by checking which of those also match the second item in the y array. When all items in the y array is checked, the index array contains only the full matches.

Edit:
An alternative implementation would be to remove the ToArray call from the statement in the loop, making it just:

index = index.Where(n => x[n + i] == y[i]);

This would totally change how the method works. Instead of looping through the items level by level, it would return an enumerator with nested expressions, deferring the search to the time when the enumerator was iterated. That means that you could get only the first match if you wanted:

int index = x.StartingIndex(y).First();

This would not find all matches and then return the first, it would just search until the first was found and then return it.

Guffa
  • 687,336
  • 108
  • 737
  • 1,005
  • @Guffa you seem to be pretty familiar with Enumerable, you've used similar approach in answering my other question http://stackoverflow.com/questions/1253454 – Jeff Nov 23 '09 at 00:25
  • @Jeffrey: I added an explanation of the algorithm above. – Guffa Nov 23 '09 at 00:34
  • @Mark: I added an alternative approach above, that would solve the issue with getting only the first match. – Guffa Nov 23 '09 at 03:05
  • this is a quite impressive algorithm, but the second variant without the ToArray throws me a index out of range exception, while the first works flawlessly. – Philipp Michalski Mar 15 '19 at 08:14
  • Yes, because a reference to `i` gets captured in the lambda for the `Where()` clause. Since linq queries are lazy-evaluated, by the time that lambda runs `i` will already equal `y.Length` creating the out of range exception. You can fix that by copying the value into a local variable in each run of the loop that stays constant, like so: ``` var i1 = i; index = index.Where(n => x[n + i1] == y[i1]); ``` – fernie Feb 15 '21 at 03:07
4

This is based off of Mark Gravell's answer but I made it generic and added some simple bounds checking to keep exceptions from being thrown

private static bool IsSubArrayEqual<T>(T[] source, T[] compare, int start) where T:IEquatable<T>
{
    if (compare.Length > source.Length - start)
    {
        //If the compare string is shorter than the test area it is not a match.
        return false;
    }

    for (int i = 0; i < compare.Length; i++)
    {
        if (source[start++].Equals(compare[i]) == false) return false;
    }
    return true;
}

Could be improved further by implementing Boyer-Moore but for short patterns it works fine.

Community
  • 1
  • 1
Scott Chamberlain
  • 124,994
  • 33
  • 282
  • 431
4

The simplest way is probably this:

public static class ArrayExtensions
{
    private static bool isMatch(int[] x, int[] y, int index)
    {
        for (int j = 0; j < y.Length; ++j)
            if (x[j + index] != y[j]) return false;
        return true;
    }

    public static int IndexOf(this int[] x, int[] y)
    {
        for (int i = 0; i < x.Length - y.Length + 1; ++i)
            if (isMatch(x, y, i)) return i;
        return -1;
    }
}

But it's definitely not the fastest way.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
2

"Simplest" and "fastest" are opposites in this case, and besides, in order to describe fast algorithms we need to know lots of things about how the source array and the search array are related to each other.

This is essentially the same problem as finding a substring inside a string. Suppose you are looking for "fox" in "the quick brown fox jumps over the lazy dog". The naive string matching algorithm is extremely good in this case. If you are searching for "bananananananananananananananana" inside a million-character string that is of the form "banananananabanananabananabananabanananananbananana..." then the naive substring matching algorithm is terrible -- far faster results can be obtained by using more complex and sophisticated string matching algorithms. Basically, the naive algorithm is O(nm) where n and m are the lengths of the source and search strings. There are O(n+m) algorithms but they are far more complex.

Can you tell us more about the data you're searching? How big is it, how redundant is it, how long are the search arrays, and what is the likelihood of a bad match?

Eric Lippert
  • 647,829
  • 179
  • 1,238
  • 2,067
  • 8
    You're the one who posted the vague question; I don't know what size your data set is, what your application is, or what your performance requirements are. It's unreasonable of you to expect that I would. Furthermore, a 600 character comment is hardly the place to summarize the vast literature on efficient string searching algorithms; pick up a good university undergraduate textbook on algorithmic design and you'll get plenty of examples of different algorithms for substring matching. – Eric Lippert Nov 23 '09 at 02:35
1

I find something along the following lines more intuitive, but that may be a matter of taste.

public static class ArrayExtensions
{
    public static int StartingIndex(this int[] x, int[] y)
    {
        var xIndex = 0;
        while(xIndex < x.length)
        {
            var found = xIndex;
            var yIndex = 0;
            while(yIndex < y.length && xIndex < x.length && x[xIndex] == y[yIndex])
            {
                xIndex++;
                yIndex++;
            }

            if(yIndex == y.length-1)
            {
                return found;
            }

            xIndex = found + 1;
        }

        return -1;
    }
}

This code also addresses an issue I believe your implementation may have in cases like x = {3, 3, 7}, y = {3, 7}. I think what would happen with your code is that it matches the first number, then resets itself on the second, but starts matching again on the third, rather than stepping back to the index just after where it started matching. May be missing something, but it's definitely something to consider and should be easily fixable in your code.

Nikolas Stephan
  • 1,280
  • 12
  • 10
0
    //this is the best in C#

    //bool contains(array,subarray)
    //  when find (subarray[0])
    //      while subarray[next] IS OK
    //          subarray.end then Return True
    public static bool ContainSubArray<T>(T[] findIn, out int found_index,
 params T[]toFind)
    {
        found_index = -1;
        if (toFind.Length < findIn.Length)
        {

            int index = 0;
            Func<int, bool> NextOk = (i) =>
                {
                    if(index < findIn.Length-1)
                        return findIn[++index].Equals(toFind[i]);
                    return false;
                };
            //----------
            int n=0;
            for (; index < findIn.Length; index++)
            {
                if (findIn[index].Equals(toFind[0]))
                {
                    found_index=index;n=1;
                    while (n < toFind.Length && NextOk(n))
                        n++;
                }
                if (n == toFind.Length)
                {
                    return true;
                }
            }

        }
        return false;
    }
0
using System;
using System.Linq;

public class Test
{
    public static void Main()
    {
        int[] x = {1,2,4,2,3,4,5,6};
        int[] y =       {2,3};
        int? index = null;

        for(int i=0; i<x.Length; ++i)
        {
            if (y.SequenceEqual(x.Skip(i).Take(y.Length)))
            {
                index = i;
                break;
            }
        }
        Console.WriteLine($"{index}");
    }
}

Output

3
abelenky
  • 63,815
  • 23
  • 109
  • 159
0
public class Test
{
    static void Main()
    {
        int[] x = { 1, 2, 4, 2, 3, 4, 5, 6 };
        int[] y = { 2, 3 };

        int index = new ReadOnlySpan<int>(x).IndexOf(y);

        if (index < 0)
            Console.WriteLine("Subarray not found");
        else
            Console.WriteLine("Subarray found at index " + index);
    }
}