0

I understand my question is similar to a couple posts out there but I think it has aspects that make it different. I am looking to find a sub-array or pattern within a much larger array. I will be working with arrays with thousands even millions of rows and I need to find a pattern within that array. The values I will be searching for are similar to the values in the array. For instance my array of say 10,000 rows will be full of 1's 0's L's and H's mainly and I will be searching for a certain pattern in there for example looking for 1 0 1 1 H.

From what I can see most of the solutions posted on other posts are dealing with much smaller scale arrays and where the sub array is more distinct from the source array. Also, when I find the array within the source array I need to return the location of that sub-array. (I am looking to do this code in C#)

Matt Burland
  • 44,552
  • 18
  • 99
  • 171
Bryan R.
  • 21
  • 4
  • 2
    Do the sub-array lengths vary? – Inisheer Mar 13 '13 at 13:05
  • 1
    How your problem is different from [This Post](http://stackoverflow.com/questions/1780423/find-the-first-occurrence-starting-index-of-the-sub-array-in-c-sharp) – PM. Mar 13 '13 at 13:11
  • 1
    You can use modification of Knuth-Morris-Pratt algorithm for substring search: http://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm – Sasha Mar 13 '13 at 13:11
  • So far my algorithm is to go through the source array and compare the element I am looking at with the first element in my search array if they are equal I go to the next one in the source array and shorten the length of the pattern I am looking for. This works fine I just assume there is a more efficient faster way of doing this. – Bryan R. Mar 13 '13 at 13:16
  • In response to the lengths yes they do vary the pattern being searched depends on the user and what they are looking for. – Bryan R. Mar 13 '13 at 13:17

1 Answers1

0

This is fundamentally the same as a substring search. They are both about finding subsequences inside a random-access larger sequence. And from your description, it sounds like your array is an array of chars, which is exactly what a string is.

The algorithm you describe in your note is pretty good and easy to code correctly. If it's not fast enough, look at KMP.

bmm6o
  • 6,187
  • 3
  • 28
  • 55