1

Say I have a big char array with several thousand items:

char[] mobyDick = "..." such that mobyDick.Length = 2000.

I want to find out if a certain array of characters exists in that array in that order, and where* it is. (Update: I really just need to know if it's after a certain index in the main array.)

char[] test = {'a','b','c','d'}

I could do something like

char[] mobyDick = "..."
string mobyString = new string(mobyDick);
if (mobyString.Contains(new string(test)))
{ do stuff}

but that's not optimal for my situation, since I'm trying to write a parser that would have to work very quickly, and I don't want to have to be creating and searching strings every letter or so.

Is there some way (algorithmically or via some .Net method) to find out whether mobyDick as a char array contains abcd as a char array?

Arcandio
  • 310
  • 2
  • 13
  • 3
    Has `test` array always 4 items? Have you actually experienced any performance issues with *convert to string and try to find substring* solution? – MarcinJuraszek Feb 07 '14 at 20:18
  • test can have 2-4 items at the moment. I haven't tested against the full string yet but I expect to be passing around strings that are several thousand words long on average, so I wanted to try to tackle this early. – Arcandio Feb 07 '14 at 20:24
  • @Arcandio, why do you have character arrays ? does the order of characters for the comparison matters ? – Habib Feb 07 '14 at 20:28
  • 2
    assuming you already guessed the *naive* algorithm; http://www.cs.utexas.edu/users/moore/best-ideas/string-searching/ – L.B Feb 07 '14 at 20:34

7 Answers7

2

This looked like an interesting problem, so I had a go at creating an extension method...

 public static class ExtensionMethods
{
    public static int ContainsArray(this char[] arrayToSearchIn, char[] arrayToFind)
    {
        if (arrayToFind.Length == 0)
            return -1;

        int lengthOfArrayToFInd = arrayToFind.Length;
        int lengthOfArrayToSearchIn = arrayToSearchIn.Length;
        for (int i = 0; i < lengthOfArrayToSearchIn; i++)
        {
            if (lengthOfArrayToSearchIn - i < lengthOfArrayToFInd)
                return -1;

            if (arrayToSearchIn[i] != arrayToFind[0])
                continue;

            int arrayToFindCounter = 0;
            bool wasFound = true;
            for (int j = i; j < i + lengthOfArrayToFInd; j++)
            {
                if (arrayToFind[arrayToFindCounter] == arrayToSearchIn[j])
                    arrayToFindCounter++;
                else
                {
                    wasFound = false;
                    break;
                }
            }

            if (wasFound)
                return i;
        }

        return -1;
    }

}

This appears (to me) to work with any length sub array, including an empty search - returns the position of the first occurrence if found (zero based), otherwise returns -1.

Example Usage:

 static void Main(string[] args)
    {
        //                        0    1    2    3    4    5    6    7    8    9    0    1    2    3    4    5    6    7    8  
        char[] mobyDick = new[] {'a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', 'a', 'b', 'c', 'd', 'a', 'z', 'y'};
        char[] test = {'a', 'b', 'c', 'd'};

        Console.WriteLine(mobyDick.ContainsArray(test));  // Position 12

        Console.ReadLine();
    }
Jay
  • 9,561
  • 7
  • 51
  • 72
1

Give this a try:

private bool Contains(char[] mobyDick, char[] test)
{
    for (int i = 0; i < mobyDick.Length - test.Length + 1; i++)
    {
        bool found = true;

        for (int j = 0; j < test.Length; j++)
        {
            if (mobyDick[i + j] != test[j])
            {
                found = false;
                break;
            }
        }

        if (found) return true;
    }

    return false;
}
Wagner DosAnjos
  • 6,304
  • 1
  • 15
  • 29
  • I know the OP has said that they are only using test arrays between 2 and 4 chars long, but this wouldn't work for a char array with one element in it. – Jay Feb 07 '14 at 20:53
  • Worked out nicely, actually. And I wouldn't have thought of the break to stop from checking the subloop. Many thanks! – Arcandio Feb 07 '14 at 20:56
  • @Arcandio My answer ended up being pretty similar to this, but I used a lamda to find all the possible starting points - ie, the indices where the first letter of the substring is. Then it does X number of searches, where X is how many indices are found. – Gray Feb 07 '14 at 20:58
  • @Arcandio I would look at Gray's answer here - it's a bit more robust. – Jay Feb 07 '14 at 21:01
  • I fixed a small bug on the outer loop. – Wagner DosAnjos Feb 07 '14 at 21:34
1

I would try this extension method:

public static bool ContainsChars(this char[] source, char[] target,out int index)
{
     int targetLength = target.Length - 1;
     int count = 0;
     char currentCharToSearch = target[0];
     for(int i=0; i<source.Length; i++)
     {
          if (source[i] == currentCharToSearch)
          {
              count++;
              if (count == targetLength) 
              {
                  index = i - count + 1;
                  return true;
              }
              else
              {
                  currentCharToSearch = target[count];
              }
           }
           else
           {
               count = 0;
               currentCharToSearch = target[0];
           }
      }
      index = -1;
      return false;
}

Usage:

var c1 = new char[] { 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'h', 't' };
var c2 = new char[] { 'c', 'h', 't' };

int index;
var result = c1.ContainsChars(c2,out index); // true index = 6

c2 = new char[] { 'c', 't', 'h' };
var result2 = c1.ContainsChars(c2,out index); // false index = -1
Selman Genç
  • 100,147
  • 13
  • 119
  • 184
1

Here's one that uses a lambda to find all of the valid "start points" for your search.

//return first index of substring or -1 for not found
int searchForChar(char [] substring, char [] fulltext)
{
    //all of the start points
    var indices = fulltext.Select ((b,i) => b == substring.FirstOrDefault() ? i : -1)
                          .Where(i => i != -1).ToArray();

    //search each start point
    foreach (var index in indices)
    {
        var found = true;
        int count = 0;
        for(int i = index; i < index + substring.Length; i++)
        {   
            found = true;
            if(substring[count++] != fulltext[i])
            {   
                found = false;
                break;
            }   
        }
        if (found) return index;
    }
    return -1;
}

Likely, a more performant way of doing this would be something like what you had in your original question.

int searchForChar(char [] substring, char [] fulltext)
{
    return fulltext.ToString().IndexOf(substring.ToString());

}
Gray
  • 7,050
  • 2
  • 29
  • 52
  • 1
    +1 Clever and much shorter than my answer - only thing I would point out is that it doesn't handle zero length substring. – Jay Feb 07 '14 at 21:00
  • Good point, Jay. I definitely didn't do much input checking. – Gray Feb 07 '14 at 21:01
  • 1
    empty substring case has been handled. – Gray Feb 07 '14 at 21:26
  • It's worth noting that `.Select` and `.Where` will end up being more expensive than converting the arrays to strings and using `String.Contains` as you indicated on the question. – Wagner DosAnjos Feb 07 '14 at 21:42
  • @wdosanjos You are probably correct. Not sure that means my answer is worth a downvote, but that is a good point. – Gray Feb 07 '14 at 21:49
0

How about a for loop to search for the first character of the test case in the big array first, and then compare the consecutive characters in your test array with the consecutive members of the big array?

ZZZ
  • 704
  • 9
  • 18
  • I was thinking about this, but it seems the long way around. I'll probably end up with this as my backup plan. – Arcandio Feb 07 '14 at 20:29
0

For the record, here is another solution using generic extension methods. It works for any array type that implements IComparable.

void Main()
{
    var c1 = new char[] { 'a', 'b', 'c', 'd', 'a', 'b', 'c', 'h', 't' };
    var c2 = new char[] { 'c', 'h', 't' };

    if (c1.Contains(c2))
    {
        // do something
    }

    int i = c1.IndexOf(c2);
}

public static class ArrayExtensions
{
    public static bool Contains<T>(this T[] array, T[] subarray) where T : IComparable
    {
        return array.IndexOf(subarray) >= 0;
    }

    public static int IndexOf<T>(this T[] array, T[] subarray) where T : IComparable
    {
        for (int i = 0; i < array.Length - subarray.Length + 1; i++)
        {
            bool found = true;

            for (int j = 0; j < subarray.Length; j++)
            {
                if (array[i + j].CompareTo(subarray[j]) != 0)
                {
                    found = false;
                    break;
                }
            }

            if (found) return i;
        }

        return -1;
    }
}
Wagner DosAnjos
  • 6,304
  • 1
  • 15
  • 29
-2

Use this:

var search = mobyDick.Intersect(test);
if (search.ToArray().Length > 0)
{
//do something
}

LINQ - Set Operators