I need to find a sequence in other large sequence, for example, {1,3,2,3}
is present in {1,3,2,3,4,3}
and {5,1,3,2,3}
. Is there any way to do it quickly with IEnumerable
or with something else?

- 18,274
- 9
- 70
- 161

- 191
- 5
- 12
-
1Is the sequence in an int array, or just a comma delimited string? – James Johnson Sep 06 '11 at 18:07
-
In fact it is a list
which can be converted to array. – user906763 Sep 06 '11 at 18:09 -
Do you want to just check whether a subsequence is in a sequence and return true/false? – BoltClock Sep 06 '11 at 18:09
-
Yes just to check whether its present or not, and returning true/false. – user906763 Sep 06 '11 at 18:10
-
8I note that you are using the word "subsequence" in a sense more strict than the common definition. A subsequence is usually defined as a sequence formed by removing elements from another sequence without reordering the original sequence. You are actually looking for a *substring* -- though of course it is very uncommon these days to use "string" to mean anything other than a sequence of *characters*. – Eric Lippert Sep 06 '11 at 18:50
8 Answers
Similar to @dlev's but this also handles {1,1,1,2}.ContainsSubsequence({1,1,2})
public static bool ContainsSubsequence<T>(this IEnumerable<T> parent, IEnumerable<T> target)
{
var pattern = target.ToArray();
var source = new LinkedList<T>();
foreach (var element in parent)
{
source.AddLast(element);
if(source.Count == pattern.Length)
{
if(source.SequenceEqual(pattern))
return true;
source.RemoveFirst();
}
}
return false;
}

- 59,778
- 26
- 187
- 249

- 2,286
- 1
- 17
- 19
-
An important subtlety in this implementation is that it only enumerates over each passed-in collection once, making the method internally consistent if it is invoked with collections whose enumeration order is non-deterministic. Also note that a Queue
could be used in place of the linked list. – Jan Hettich Jun 15 '15 at 21:15
This method will find a subsequence within a parent sequence, of any type that can be compared via Equals()
:
public static bool ContainsSubequence<T>(this IEnumerable<T> parent, IEnumerable<T> target)
{
bool foundOneMatch = false;
using (IEnumerator<T> parentEnum = parent.GetEnumerator())
{
using (IEnumerator<T> targetEnum = target.GetEnumerator())
{
// Get the first target instance; empty sequences are trivially contained
if (!targetEnum.MoveNext())
return true;
while (parentEnum.MoveNext())
{
if (targetEnum.Current.Equals(parentEnum.Current))
{
// Match, so move the target enum forward
foundOneMatch = true;
if (!targetEnum.MoveNext())
{
// We went through the entire target, so we have a match
return true;
}
}
else if (foundOneMatch)
{
return false;
}
}
return false;
}
}
}
You can use it like this:
bool match = new[] {1, 2, 3}.ContainsSubsequence(new[] {1, 2}); // match == true
match = new[] {1, 2, 3}.ContainsSubsequence(new[] {1, 3}); // match == false
Note that it assumes the target sequence has no null
elements.
Update: Thanks for the upvotes, everyone, but there is actually a bug in the above code! If a partial match is found, but then doesn't turn into a full match, the process is ended, rather than reset (which is obviously incorrected when applied to something like {1, 2, 1, 2, 3}.ContainsSubsequence({1, 2, 3})
).
The above code works really well for the more common definition of subsequence (i.e. contiguousness is not required) but in order to handle resetting (which most IEnumerators
do not support) the target sequence needs to be enumerated up front. That leads to the following code:
public static bool ContainsSubequence<T>(this IEnumerable<T> parent, IEnumerable<T> target)
{
bool foundOneMatch = false;
var enumeratedTarget = target.ToList();
int enumPos = 0;
using (IEnumerator<T> parentEnum = parent.GetEnumerator())
{
while (parentEnum.MoveNext())
{
if (enumeratedTarget[enumPos].Equals(parentEnum.Current))
{
// Match, so move the target enum forward
foundOneMatch = true;
if (enumPos == enumeratedTarget.Count - 1)
{
// We went through the entire target, so we have a match
return true;
}
enumPos++;
}
else if (foundOneMatch)
{
foundOneMatch = false;
enumPos = 0;
if (enumeratedTarget[enumPos].Equals(parentEnum.Current))
{
foundOneMatch = true;
enumPos++;
}
}
}
return false;
}
}
This code doesn't have any bugs, but won't work well for large (or infinite) sequences.

- 48,024
- 5
- 125
- 132
-
1Did you make a mistake in your example? How is 1, 3 a sequence in 1, 2, 3? Both numbers exist, but not in sequence. It seems like your method determines whether numbers exist, without regard for order or sequence. – James Johnson Sep 06 '11 at 18:27
-
@James As I mention at the end, it allows for non-contiguous subsequences (so for example, `{1, 2, 3}.ContainsSubequence({2, 1, 3}) == false`.) As I also mention at the end, that condition is easily imposed, should that be required. Usually when talking about subsequences, contiguousness is *not* required (see, for example, the longest common subsequence problem.) – dlev Sep 06 '11 at 18:28
-
-
1@dlev The following test seem to fail `{1,1,1,2}.ContainsSubsequence({1,1,2}) == true`. It is hard to make this test without making two arrays of the two enumerables. – erikH Sep 07 '11 at 07:06
-
@erikH, you are right. The algo needs to re-wind parentEnum to the the next element after the last partial match. – chris stevens Jun 13 '16 at 11:36
-
1I believe an optimal answer would be based on [Knuth-Morris-Pratt](https://en.wikipedia.org/wiki/Knuth%E2%80%93Morris%E2%80%93Pratt_algorithm) or [Boyer-Moore](https://en.wikipedia.org/wiki/Boyer%E2%80%93Moore_string-search_algorithm), and example of which is in [my answer](https://stackoverflow.com/a/54309797/659190). – Jodrell Jan 22 '19 at 14:00
This is quite a well studied problem and according to my research there are two algorithms that are optimal for this job, depending on your data.
Namely, they are the Knuth-Morris-Pratt algorithm and the Boyer-Moore algorithm.
Here, I submit my implementation of the KMP algorithm, originally reviewed here.
Its written to handle a source or parent sequence of indeterminate length up to Int64.MaxValue
.
As we can see, the internal implementation returns a sequence of the indices at which the sub-string or target pattern occurs. You can present these results through your choice of facade.
You could use this simply like this,
var contains = new[] { 1, 3, 2, 3, 4, 3 }.Contains(new[] { 1, 3, 2, 3 });
Here is a working fiddle that shows the code in action.
Below, follows the fully commented code for my answer.
namespace Code
{
using System;
using System.Collections.Generic;
using System.Linq;
/// <summary>
/// A generic implementation of the Knuth-Morris-Pratt algorithm that searches,
/// in a memory efficient way, over a given <see cref="IEnumerable{T}"/>.
/// </summary>
public static class KMP
{
/// <summary>
/// Determines whether a sequence contains the search string.
/// </summary>
/// <typeparam name="T">
/// The type of elements of <paramref name="source"/>
/// </typeparam>
/// <param name="source">
/// A sequence of elements
/// </param>
/// <param name="pattern">The search string.</param>
/// <param name="equalityComparer">
/// Determines whether the sequence contains a specified element.
/// If <c>null</c>
/// <see cref="EqualityComparer{T}.Default"/> will be used.
/// </param>
/// <returns>
/// <c>true</c> if the source contains the specified pattern;
/// otherwise, <c>false</c>.
/// </returns>
/// <exception cref="ArgumentNullException">pattern</exception>
public static bool Contains<T>(
this IEnumerable<T> source,
IEnumerable<T> pattern,
IEqualityComparer<T> equalityComparer = null)
{
if (pattern == null)
{
throw new ArgumentNullException(nameof(pattern));
}
equalityComparer = equalityComparer ?? EqualityComparer<T>.Default;
return SearchImplementation(source, pattern, equalityComparer).Any();
}
public static IEnumerable<long> IndicesOf<T>(
this IEnumerable<T> source,
IEnumerable<T> pattern,
IEqualityComparer<T> equalityComparer = null)
{
if (pattern == null)
{
throw new ArgumentNullException(nameof(pattern));
}
equalityComparer = equalityComparer ?? EqualityComparer<T>.Default;
return SearchImplementation(source, pattern, equalityComparer);
}
/// <summary>
/// Identifies indices of a pattern string in a given sequence.
/// </summary>
/// <typeparam name="T">
/// The type of elements of <paramref name="source"/>
/// </typeparam>
/// <param name="source">
/// The sequence to search.
/// </param>
/// <param name="patternString">
/// The string to find in the sequence.
/// </param>
/// <param name="equalityComparer">
/// Determines whether the sequence contains a specified element.
/// </param>
/// <returns>
/// A sequence of indices where the pattern can be found
/// in the source.
/// </returns>
/// <exception cref="ArgumentOutOfRangeException">
/// patternSequence - The pattern must contain 1 or more elements.
/// </exception>
private static IEnumerable<long> SearchImplementation<T>(
IEnumerable<T> source,
IEnumerable<T> patternString,
IEqualityComparer<T> equalityComparer)
{
// Pre-process the pattern
(var slide, var pattern) = GetSlide(patternString, equalityComparer);
var patternLength = pattern.Count;
if (patternLength == 0)
{
throw new ArgumentOutOfRangeException(
nameof(patternString),
"The pattern must contain 1 or more elements.");
}
var buffer = new Dictionary<long, T>(patternLength);
var more = true;
long sourceIndex = 0; // index for source
int patternIndex = 0; // index for pattern
using(var sourceEnumerator = source.GetEnumerator())
while (more)
{
more = FillBuffer(
buffer,
sourceEnumerator,
sourceIndex,
patternLength,
out T t);
if (equalityComparer.Equals(pattern[patternIndex], t))
{
patternIndex++;
sourceIndex++;
more = FillBuffer(
buffer,
sourceEnumerator,
sourceIndex,
patternLength,
out t);
}
if (patternIndex == patternLength)
{
yield return sourceIndex - patternIndex;
patternIndex = slide[patternIndex - 1];
}
else if (more && !equalityComparer.Equals(pattern[patternIndex], t))
{
if (patternIndex != 0)
{
patternIndex = slide[patternIndex - 1];
}
else
{
sourceIndex = sourceIndex + 1;
}
}
}
}
/// <summary>
/// Services the buffer and retrieves the value.
/// </summary>
/// <remarks>
/// The buffer is used so that it is not necessary to hold the
/// entire source in memory.
/// </remarks>
/// <typeparam name="T">
/// The type of elements of <paramref name="source"/>.
/// </typeparam>
/// <param name="buffer">The buffer.</param>
/// <param name="source">The source enumerator.</param>
/// <param name="sourceIndex">The element index to retrieve.</param>
/// <param name="patternLength">Length of the search string.</param>
/// <param name="value">The element value retrieved from the source.</param>
/// <returns>
/// <c>true</c> if there is potentially more data to process;
/// otherwise <c>false</c>.
/// </returns>
private static bool FillBuffer<T>(
IDictionary<long, T> buffer,
IEnumerator<T> source,
long sourceIndex,
int patternLength,
out T value)
{
bool more = true;
if (!buffer.TryGetValue(sourceIndex, out value))
{
more = source.MoveNext();
if (more)
{
value = source.Current;
buffer.Remove(sourceIndex - patternLength);
buffer.Add(sourceIndex, value);
}
}
return more;
}
/// <summary>
/// Gets the offset array which acts as a slide rule for the KMP algorithm.
/// </summary>
/// <typeparam name="T">
/// The type of elements of <paramref name="source"/>.
/// </typeparam>
/// <param name="pattern">The search string.</param>
/// <param name="equalityComparer">
/// Determines whether the sequence contains a specified element.
/// If <c>null</c>
/// <see cref="EqualityComparer{T}.Default"/> will be used.
/// </param>
/// <returns>A tuple of the offsets and the enumerated pattern.</returns>
private static (IReadOnlyList<int> Slide, IReadOnlyList<T> Pattern) GetSlide<T>(
IEnumerable<T> pattern,
IEqualityComparer<T> equalityComparer)
{
var patternList = pattern.ToList();
var slide = new int[patternList.Count];
int length = 0;
int patternIndex = 1;
while (patternIndex < patternList.Count)
{
if (equalityComparer.Equals(
patternList[patternIndex],
patternList[length]))
{
length++;
slide[patternIndex] = length;
patternIndex++;
}
else
{
if (length != 0)
{
length = slide[length - 1];
}
else
{
slide[patternIndex] = length;
patternIndex++;
}
}
}
return (slide, patternList);
}
}
}

- 34,946
- 5
- 87
- 124
This function checks whether List parent
contains List target
using some LINQ:
public static bool ContainsSequence<T>(this List<T> parent, List<T> target)
{
for (int fromElement = parent.IndexOf(target.First());
(fromElement != -1) && (fromElement <= parent.Count - target.Count);
fromElement = parent.FindIndex(fromElement + 1, p => p.Equals(target.First())))
{
var comparedSequence = parent.Skip(fromElement).Take(target.Count);
if (comparedSequence.SequenceEqual(target)) return true;
}
return false;
}

- 4,537
- 6
- 30
- 44
If you are handling simple serializable types, you can do that quite easily if you convert the arrays to string:
public static bool ContainsList<T>(this List<T> containingList, List<T> containedList)
{
string strContaining = "," + string.Join(",", containingList) + ",";
string strContained = "," + string.Join(",", containedList) + ",";
return strContaining.Contains(strContained);
}
Note that's an extension method, so you will be able to call it like:
if (bigList.ContainsList(smallList))
{
...
}

- 7,602
- 2
- 34
- 42
You can try something like this to get you started. Once you've converted this list to a string, you can find the sequence using the substring:
if (String.Join(",", numericList.ConvertAll<string>(x => x.ToString()).ToArray())
{
//get sequence
}

- 45,496
- 8
- 73
- 110
This works for me
var a1 = new List<int> { 1, 2, 3, 4, 5 };
var a2 = new List<int> { 2, 3, 4 };
int index = -1;
bool res = a2.All(
x => index != -1 ? (++index == a1.IndexOf(x)) : ((index = a1.IndexOf(x)) != -1)
);

- 962
- 6
- 14
-
1Your solution seems the most elegant but try it with `a1 = new List
{ 1, 2, 9, 1, 2, 3, 4, 5 }` and `a2 = new List – user2341923 Nov 03 '15 at 08:11{ 2, 3, 4 }` instead; also try if `a1` contains itself with this new value - it does not work for me.
Here you have an example with daily work, so far, without problems. I just iterate over a fairly long sequence, looking for matches for a given sub sequence.
static bool SubSequenceExist(byte[] longSeq, byte[] subseq)
{
int seq = longSeq.Length;
int seqToFindLen = subseq.Length;
int seqMatchCount = 0;
for (int i = 0; i < seq; i++)
{
if (longSeq[i] == subseq[seqMatchCount])
seqMatchCount++;
else
seqMatchCount = 0;
if (seqMatchCount == seqToFindLen) return true;
}
return false;
}