C# string comparison ignoring spaces, carriage return or line breaks

Question

How can I compare 2 strings in C# ignoring the case, spaces and any line-breaks. I also need to check if both strings are null then they are marked as same.

Thanks!

see also http://stackoverflow.com/questions/6859255/how-do-i-make-my-string-compare-not-sensitive-to-ignore-miner-differences-in-wh/6859344#6859344 — Ian Ringrose, Jul 28 '11 at 13:40
THat other SO question notes the CompareOptions.IgnoreSymbols on String.Compare - which answers this requirment — MrTelly, Aug 13 '14 at 03:39

João Angelo · Answer 1 · 2011-01-17T23:34:16.727

98

You should normalize each string by removing the characters that you don't want to compare and then you can perform a String.Equals with a StringComparison that ignores case.

Something like this:

string s1 = "HeLLo    wOrld!";
string s2 = "Hello\n    WORLd!";

string normalized1 = Regex.Replace(s1, @"\s", "");
string normalized2 = Regex.Replace(s2, @"\s", "");

bool stringEquals = String.Equals(
    normalized1, 
    normalized2, 
    StringComparison.OrdinalIgnoreCase);

Console.WriteLine(stringEquals);

Here Regex.Replace is used first to remove all whitespace characters. The special case of both strings being null is not treated here but you can easily handle that case before performing the string normalization.

edited Jan 17 '11 at 23:34

answered Jan 17 '11 at 23:27

João Angelo

56,552
12
145
147

Would there be any performance impact from using a `Regex.Replace` on the two strings here? – JDandChips Mar 20 '14 at 09:22
That's something not easily answerable. Of course there are better solutions in terms of performance, for example a solution that does not require creating two new strings but unless you proved that the regular expression is a bottleneck in your specific scenario then I would not bother with it. – João Angelo Mar 20 '14 at 14:46
why semi-normalization, still requiring the stringcomparison? – Jul 05 '19 at 20:41

score 41 · Answer 2 · edited Nov 27 '22 at 11:06

41

This may also work.

String.Compare(s1, s2, CultureInfo.CurrentCulture, CompareOptions.IgnoreCase | CompareOptions.IgnoreSymbols) == 0

Edit:

IgnoreSymbols: Indicates that the string comparison must ignore symbols, such as white-space characters, punctuation, currency symbols, the percent sign, mathematical symbols, the ampersand, and so on.

edited Nov 27 '22 at 11:06

Alex from Jitbit

53,710
19
160
149

answered May 21 '19 at 02:15

Louis

439
4
4

3

This is the best answer so far. However it merges non-space, that may result in unexpected equivalence like "1 2 3" would be equal to "123". – dmitry Nov 06 '19 at 12:04
I took the liberty to fix the link (it was pointing to older .NET that does not have this option) – Alex from Jitbit Nov 27 '22 at 11:07

helloworld922 · Accepted Answer · 2012-11-08T07:25:12.150

9

Remove all the characters you don't want and then use the ToLower() method to ignore case.

edit: While the above works, it's better to use StringComparison.OrdinalIgnoreCase. Just pass it as the second argument to the Equals method.

edited Nov 08 '12 at 07:25

answered Jan 17 '11 at 23:28

helloworld922

10,801
5
48
85

40

-1: Should use StringComparison.OrdinalIgnoreCase, not ToLower(). – zimdanen Nov 07 '12 at 20:14
1

Wouldn't using OrdinalIgnoreCase disregard current culture, so running into (For example) the turkish uppercase i problem? – Michael Parker Mar 07 '14 at 17:29
6

The method with `StringComparison.OrdinalIgnoreCase` wouldn't solve the problem with ignoring new lines. – cederlof Jan 11 '17 at 08:49

score 6 · Answer 4 · edited May 23 '17 at 12:02

If you need performance, the Regex solutions on this page run too slow for you. Maybe you have a large list of strings you want to sort. (A Regex solution is more readable however)

I have a class that looks at each individual char in both strings and compares them while ignoring case and whitespace. It doesn't allocate any new strings. It uses the char.IsWhiteSpace(ch) to determine whitespace, and char.ToLowerInvariant(ch) for case-insensitivity (if required). In my testing, my solution runs about 5x - 8x faster than a Regex-based solution. My class also implements IEqualityComparer's GetHashCode(obj) method using this code in another SO answer. This GetHashCode(obj) also ignores whitespace and optionally ignores case.

Here's my class:

private class StringCompIgnoreWhiteSpace : IEqualityComparer<string>
{
    public bool Equals(string strx, string stry)
    {
        if (strx == null) //stry may contain only whitespace
            return string.IsNullOrWhiteSpace(stry);

        else if (stry == null) //strx may contain only whitespace
            return string.IsNullOrWhiteSpace(strx);

        int ix = 0, iy = 0;
        for (; ix < strx.Length && iy < stry.Length; ix++, iy++)
        {
            char chx = strx[ix];
            char chy = stry[iy];

            //ignore whitespace in strx
            while (char.IsWhiteSpace(chx) && ix < strx.Length)
            {
                ix++;
                chx = strx[ix];
            }

            //ignore whitespace in stry
            while (char.IsWhiteSpace(chy) && iy < stry.Length)
            {
                iy++;
                chy = stry[iy];
            }

            if (ix == strx.Length && iy != stry.Length)
            { //end of strx, so check if the rest of stry is whitespace
                for (int iiy = iy + 1; iiy < stry.Length; iiy++)
                {
                    if (!char.IsWhiteSpace(stry[iiy]))
                        return false;
                }
                return true;
            }

            if (ix != strx.Length && iy == stry.Length)
            { //end of stry, so check if the rest of strx is whitespace
                for (int iix = ix + 1; iix < strx.Length; iix++)
                {
                    if (!char.IsWhiteSpace(strx[iix]))
                        return false;
                }
                return true;
            }

            //The current chars are not whitespace, so check that they're equal (case-insensitive)
            //Remove the following two lines to make the comparison case-sensitive.
            chx = char.ToLowerInvariant(chx);
            chy = char.ToLowerInvariant(chy);

            if (chx != chy)
                return false;
        }

        //If strx has more chars than stry
        for (; ix < strx.Length; ix++)
        {
            if (!char.IsWhiteSpace(strx[ix]))
                return false;
        }

        //If stry has more chars than strx
        for (; iy < stry.Length; iy++)
        {
            if (!char.IsWhiteSpace(stry[iy]))
                return false;
        }

        return true;
    }

    public int GetHashCode(string obj)
    {
        if (obj == null)
            return 0;

        int hash = 17;
        unchecked // Overflow is fine, just wrap
        {
            for (int i = 0; i < obj.Length; i++)
            {
                char ch = obj[i];
                if(!char.IsWhiteSpace(ch))
                    //use this line for case-insensitivity
                    hash = hash * 23 + char.ToLowerInvariant(ch).GetHashCode();

                    //use this line for case-sensitivity
                    //hash = hash * 23 + ch.GetHashCode();
            }
        }
        return hash;
    }
}

private static void TestComp()
{
    var comp = new StringCompIgnoreWhiteSpace();

    Console.WriteLine(comp.Equals("abcd", "abcd")); //true
    Console.WriteLine(comp.Equals("abCd", "Abcd")); //true
    Console.WriteLine(comp.Equals("ab Cd", "Ab\n\r\tcd   ")); //true
    Console.WriteLine(comp.Equals(" ab Cd", "  A b" + Environment.NewLine + "cd ")); //true
    Console.WriteLine(comp.Equals(null, "  \t\n\r ")); //true
    Console.WriteLine(comp.Equals("  \t\n\r ", null)); //true
    Console.WriteLine(comp.Equals("abcd", "abcd   h")); //false

    Console.WriteLine(comp.GetHashCode(" a b c d")); //-699568861


    //This is -699568861 if you #define StringCompIgnoreWhiteSpace_CASE_INSENSITIVE
    //  Otherwise it's -1555613149
    Console.WriteLine(comp.GetHashCode("A B c      \t       d"));
}

Here's my testing code (with a Regex example):

private static void SpeedTest()
{
    const int loop = 100000;
    string first = "a bc d";
    string second = "ABC D";

    var compChar = new StringCompIgnoreWhiteSpace();
    Stopwatch sw1 = Stopwatch.StartNew();
    for (int i = 0; i < loop; i++)
    {
        bool equals = compChar.Equals(first, second);
    }
    sw1.Stop();
    Console.WriteLine(string.Format("char time =  {0}", sw1.Elapsed)); //char time =  00:00:00.0361159

    var compRegex = new StringCompIgnoreWhiteSpaceRegex();
    Stopwatch sw2 = Stopwatch.StartNew();
    for (int i = 0; i < loop; i++)
    {
        bool equals = compRegex.Equals(first, second);
    }
    sw2.Stop();
    Console.WriteLine(string.Format("regex time = {0}", sw2.Elapsed)); //regex time = 00:00:00.2773072
}

private class StringCompIgnoreWhiteSpaceRegex : IEqualityComparer<string>
{
    public bool Equals(string strx, string stry)
    {
        if (strx == null)
            return string.IsNullOrWhiteSpace(stry);
        else if (stry == null)
            return string.IsNullOrWhiteSpace(strx);

        string a = System.Text.RegularExpressions.Regex.Replace(strx, @"\s", "");
        string b = System.Text.RegularExpressions.Regex.Replace(stry, @"\s", "");
        return String.Compare(a, b, true) == 0;
    }

    public int GetHashCode(string obj)
    {
        if (obj == null)
            return 0;

        string a = System.Text.RegularExpressions.Regex.Replace(obj, @"\s", "");
        return a.GetHashCode();
    }
}

I wrote unit tests to this function and found corner cases where it fails on out of bounds. `ix < strx.Length` (and the same of `iy < stry.Length`) should be `ix < strx.Length - 1`. — Itai Bar-Haim, Feb 13 '19 at 11:37
Hint: this method is on the right track, but it's way too complicated. Get rid of all the conditions on the main for loop, set the two character variables to a space at the beginning of each loop (instead of the current string index character), and if both characters are whitespace at the end of the main for loop, the strings match. The other loops and conditions are completely unnecessary. — Bryce Wagner, Feb 20 '19 at 17:40
@BryceWagner I give up. I can't figure out the solution you're hinting at. Why not provide a separate answer to this question? — user2023861, Aug 05 '19 at 20:38

score 6 · Answer 5 · answered Jan 17 '11 at 23:35

First replace all whitespace via regular expression from both string and then use the String.Compare method with parameter ignoreCase = true.

string a = System.Text.RegularExpressions.Regex.Replace("void foo", @"\s", "");
string b = System.Text.RegularExpressions.Regex.Replace("voidFoo", @"\s", "");
bool isTheSame = String.Compare(a, b, true) == 0;

score 4 · Answer 6 · answered Jan 17 '11 at 23:26

4

I would probably start by removing the characters you don't want to compare from the string before comparing. If performance is a concern, you might look at storing a version of each string with the characters already removed.

Alternatively, you could write a compare routine that would skip over the characters you want to ignore. But that just seems like more work to me.

answered Jan 17 '11 at 23:26

Jonathan Wood

65,341
71
269
466

1

+1 http://msdn.microsoft.com/en-us/library/aa904305(VS.71).aspx, then http://msdn.microsoft.com/en-us/library/system.string.join.aspx and the compare. – kenny Jan 17 '11 at 23:37

score 2 · Answer 7 · answered Mar 13 '13 at 09:56

You can also use the following custom function

public static string ExceptChars(this string str, IEnumerable<char> toExclude)
        {
            StringBuilder sb = new StringBuilder();
            for (int i = 0; i < str.Length; i++)
            {
                char c = str[i];
                if (!toExclude.Contains(c))
                    sb.Append(c);
            }
            return sb.ToString();
        }

        public static bool SpaceCaseInsenstiveComparision(this string stringa, string stringb)
        {
            return (stringa==null&&stringb==null)||stringa.ToLower().ExceptChars(new[] { ' ', '\t', '\n', '\r' }).Equals(stringb.ToLower().ExceptChars(new[] { ' ', '\t', '\n', '\r' }));
        }

And then use it following way

"Te  st".SpaceCaseInsenstiveComparision("Te st");

I'd avoid `.ToLower()` calls (because they create another string) and use `StringComparison.OrdinalIgnoreCase` (which is also faster). — Ivaylo Slavov, Jun 05 '13 at 08:20

score 2 · Answer 8 · answered Jul 16 '15 at 07:01

Another option is the LINQ SequenceEquals method which according to my tests is more than twice as fast as the Regex approach used in other answers and very easy to read and maintain.

public static bool Equals_Linq(string s1, string s2)
{
    return Enumerable.SequenceEqual(
        s1.Where(c => !char.IsWhiteSpace(c)).Select(char.ToUpperInvariant),
        s2.Where(c => !char.IsWhiteSpace(c)).Select(char.ToUpperInvariant));
}

public static bool Equals_Regex(string s1, string s2)
{
    return string.Equals(
        Regex.Replace(s1, @"\s", ""),
        Regex.Replace(s2, @"\s", ""),
        StringComparison.OrdinalIgnoreCase);
}

Here the simple performance test code I used:

var s1 = "HeLLo    wOrld!";
var s2 = "Hello\n    WORLd!";
var watch = Stopwatch.StartNew();
for (var i = 0; i < 1000000; i++)
{
    Equals_Linq(s1, s2);
}
Console.WriteLine(watch.Elapsed); // ~1.7 seconds
watch = Stopwatch.StartNew();
for (var i = 0; i < 1000000; i++)
{
    Equals_Regex(s1, s2);
}
Console.WriteLine(watch.Elapsed); // ~4.6 seconds

score 1 · Answer 9 · 2019-07-07T10:31:40.757

An approach not optimized for performance, but for completeness.

normalizes null
normalizes unicode, combining characters, diacritics
normalizes new lines
normalizes white space
normalizes casing

code snippet:

public static class StringHelper
{
    public static bool AreEquivalent(string source, string target)
    {
        if (source == null) return target == null;
        if (target == null) return false;
        var normForm1 = Normalize(source);
        var normForm2 = Normalize(target);
        return string.Equals(normForm1, normForm2);
    }

    private static string Normalize(string value)
    {
        Debug.Assert(value != null);
        // normalize unicode, combining characters, diacritics
        value = value.Normalize(NormalizationForm.FormC);
        // normalize new lines to white space
        value = value.Replace("\r\n", "\n").Replace("\r", "\n");
        // normalize white space
        value = Regex.Replace(value, @"\s", string.Empty);
        // normalize casing
        return value.ToLowerInvariant();
    }
}

This will result in false positives for "1 2 3" and "123". I do not think the question meant that... — dmitry, Nov 06 '19 at 11:56

score -1 · Answer 10 · answered May 03 '19 at 16:36

-1

I would Trim the string using Trim() to remove all the
whitespace.
Use StringComparison.OrdinalIgnoreCase to ignore case sensitivity ex. stringA.Equals(stringB, StringComparison.OrdinalIgnoreCase)

answered May 03 '19 at 16:36

Jayowl

280
3
11

1

Trim() only removes at the start and end – Jul 05 '19 at 20:39

C# string comparison ignoring spaces, carriage return or line breaks

10 Answers10

Linked