Is there a way of getting a matched substring from a string using a culture-sensitive equality comparison? For example, under the en-US culture, æ
and ae
are considered equal. "Encyclopædia".IndexOf("aed")
evaluates to 8
, indicating a match; however, is there a way of extracting the matched substring, æd
, that does not involve iterating over the source string? Note that the lengths of the sought and the matched substrings can differ by several characters.
Asked
Active
Viewed 408 times
7

Douglas
- 53,759
- 13
- 140
- 188
-
What about using a regular expression like `(ae|æ)d`? – juharr Feb 18 '16 at 15:52
-
@juharr: Regex would be overkill (and introduce its own set of nuances). I need this to implement very general-purpose functionality, such as a culture-sensitive `String.Replace` extension method. – Douglas Feb 18 '16 at 16:02
-
Related questions: [How can I perform a culture-sensitive “starts-with” operation from the middle of a string?](http://stackoverflow.com/q/15980310/1149773) (by Jon Skeet), [Length of substring matched by culture-sensitive `String.IndexOf` method](http://stackoverflow.com/q/20480016/1149773). – Douglas Feb 20 '16 at 09:11
-
For anyone interested, I posted a lengthy blog article about this issue, which discusses the various options for resolving it: [Finding substrings using culture-sensitive comparisons](http://dogmamix.com/cms/blog/FindingSubstrings) – Douglas Apr 04 '16 at 08:14
2 Answers
2
I ended up solving this by first calling IndexOf
to get the starting position of the match, then iteratively attempting to identify its length. I optimized for the hot path of the match having the same length as the specified substring; in that case, only a single comparison is performed.
public static class StringExtensions
{
public static void Find(this string source, string substring, StringComparison comparisonType, out int matchIndex, out int matchLength)
{
Find(source, substring, 0, source.Length, comparisonType, out matchIndex, out matchLength);
}
public static void Find(this string source, string substring, int searchIndex, StringComparison comparisonType, out int matchIndex, out int matchLength)
{
Find(source, substring, searchIndex, source.Length - searchIndex, comparisonType, out matchIndex, out matchLength);
}
public static void Find(this string source, string substring, int searchIndex, int searchLength, StringComparison comparisonType, out int matchIndex, out int matchLength)
{
matchIndex = source.IndexOf(substring, searchIndex, searchLength, comparisonType);
if (matchIndex == -1)
{
matchLength = -1;
return;
}
matchLength = FindMatchLength(source, substring, searchIndex, searchLength, comparisonType, matchIndex);
// Defensive programming, but should never happen
if (matchLength == -1)
matchIndex = -1;
}
private static int FindMatchLength(string source, string substring, int searchIndex, int searchLength, StringComparison comparisonType, int matchIndex)
{
int matchLengthMaximum = searchLength - (matchIndex - searchIndex);
int matchLengthInitial = Math.Min(substring.Length, matchLengthMaximum);
// Hot path: match length is same as substring length.
if (Compare(source, matchIndex, matchLengthInitial, substring, 0, substring.Length, comparisonType) == 0)
return matchLengthInitial;
int matchLengthDecrementing = matchLengthInitial - 1;
int matchLengthIncrementing = matchLengthInitial + 1;
while (matchLengthDecrementing >= 0 || matchLengthIncrementing <= matchLengthMaximum)
{
if (matchLengthDecrementing >= 0)
{
if (Compare(source, matchIndex, matchLengthDecrementing, substring, 0, substring.Length, comparisonType) == 0)
return matchLengthDecrementing;
matchLengthDecrementing--;
}
if (matchLengthIncrementing <= matchLengthMaximum)
{
if (Compare(source, matchIndex, matchLengthIncrementing, substring, 0, substring.Length, comparisonType) == 0)
return matchLengthIncrementing;
matchLengthIncrementing++;
}
}
// Should never happen
return -1;
}
private static int Compare(string strA, int indexA, int lengthA, string strB, int indexB, int lengthB, StringComparison comparisonType)
{
switch (comparisonType)
{
case StringComparison.CurrentCulture:
return CultureInfo.CurrentCulture.CompareInfo.Compare(strA, indexA, lengthA, strB, indexB, lengthB, CompareOptions.None);
case StringComparison.CurrentCultureIgnoreCase:
return CultureInfo.CurrentCulture.CompareInfo.Compare(strA, indexA, lengthA, strB, indexB, lengthB, CompareOptions.IgnoreCase);
case StringComparison.InvariantCulture:
return CultureInfo.InvariantCulture.CompareInfo.Compare(strA, indexA, lengthA, strB, indexB, lengthB, CompareOptions.None);
case StringComparison.InvariantCultureIgnoreCase:
return CultureInfo.InvariantCulture.CompareInfo.Compare(strA, indexA, lengthA, strB, indexB, lengthB, CompareOptions.IgnoreCase);
case StringComparison.Ordinal:
return CultureInfo.InvariantCulture.CompareInfo.Compare(strA, indexA, lengthA, strB, indexB, lengthB, CompareOptions.Ordinal);
case StringComparison.OrdinalIgnoreCase:
return CultureInfo.InvariantCulture.CompareInfo.Compare(strA, indexA, lengthA, strB, indexB, lengthB, CompareOptions.OrdinalIgnoreCase);
default:
throw new ArgumentException("The string comparison type passed in is currently not supported.", nameof(comparisonType));
}
}
}
Sample use:
int index, length;
source.Find(remove, StringComparison.CurrentCulture, out index, out length);
string clean = index < 0 ? source : source.Remove(index, length);

Douglas
- 53,759
- 13
- 140
- 188
1
Since .NET 5.0 System.Globalization.CompareInfo has method that returns the matched length:
int IndexOf(ReadOnlySpan<char> source, ReadOnlySpan<char> value, CompareOptions options, out int matchLength);
See CompareInfo.IndexOf Method

Hursev
- 41
- 3