How can compare two unsorted string in c#?

Question

I have 2 string for example "abcdefg" "bcagfed" I want to write a method that determine these two strings are equal

*sorted or unsorted is not matter. In my example my above strings are equal.

One answer is: before checking equality, sort them and after that easily can check equality but this is not the fastest way

Would you consider `"aab"` and `"abb"` equal (since they both consist of only `a`'s and `b`'s)? What about "a" and "aaaa"? — Mathias R. Jessen, Feb 17 '22 at 15:17
The fastest way is to create a char[] of length 26 or whatever and then do char lookups where you loop each string only once. BUT this question is pretty useless because if you every write code like this in production everyone will slap you. Just sort the string :-) — Charles, Feb 17 '22 at 15:17
Are there duplicate characters in the string? Like, for example, can a letter be there more than once? — Etienne de Martel, Feb 17 '22 at 15:27
So you wish to see if the strings contain the same characters and number of them, if not in the same order? Are duplicate letters allowed i.e. are "abab" and "baba" considered the same? — ChrisBD, Feb 17 '22 at 15:36
*"One answer is: before checking equality, sort them and after that easily can check equality but this is not the fastest way"* Indeed, but except in the rarest of circumstances, the speed difference simply won't matter. CPU time is cheap, developer time is expensive, so go for the simplest and most maintainable solution, not for the fastest one. — Heinzi, Feb 17 '22 at 15:47
Possible duplicate of https://stackoverflow.com/questions/1673347/linq-determine-if-two-sequences-contains-exactly-the-same-elements — Matthew Watson, Feb 17 '22 at 16:27
@Heinzi i must remember that one ' cpu time is cheap...', an update on the whole 'premature optimization' maxim — pm100, Feb 17 '22 at 19:24

score 6 · Answer 1 · answered Feb 17 '22 at 15:43

6

I prefer using SequenceEqual, it checks whether the two source sequences are of equal length and their corresponding elements are equal.

var str1 = "abcdefg"; 
var str2 = "bcagfed";

var areEqual = Enumerable.SequenceEqual(str1.OrderBy(c => c), str2.OrderBy(c => c));

answered Feb 17 '22 at 15:43

NazaRN

399
1
5

Also here SequenceEqual operates over IEnumerable, so we gain a bit of performance having deferred work until later, possibly optimizing along the way by C# runtime. – NazaRN Feb 17 '22 at 15:51

Idle_Mind · Accepted Answer · 2022-02-17T15:43:43.317

Here's one way to do it without sorting first, and without using anything fancy like LINQ:

string str1 = "abcdefg";
string str2 = "bcagfed";

bool stringsAreEqual = (str1.Length == str2.Length);
if (stringsAreEqual)
{
    // see if each char from str1 is in str2, removing from str2 as we go
    List<char> str2chars = new List<char>(str2.ToCharArray());
    foreach (char c in str1.ToCharArray())
    {
        int index = str2chars.IndexOf(c);
        if (index != -1)
        {
            str2chars.RemoveAt(index);
        }
        else
        {
            stringsAreEqual = false;
            break;
        }
    }
    // any chars left over in str2?
    stringsAreEqual = stringsAreEqual && (str2chars.Count == 0);
}            

Console.WriteLine(str1);
Console.WriteLine(str2);
Console.WriteLine(stringsAreEqual);

This is how I was going to suggest it, given that sorting first is a less desirable option for the asker, this seems to be the simplest. If you have any characters left over in the second string then they aren't equal, or if you find any in the first string that aren't in the second, then they obviously aren't equal. I think that this might even be faster than sorting first, since many sorting methods iterate multiple times over the set of items to fulfil the sort, then would need to iterate both again to compare them. Whereas this will only need to iterate one time. — JG222, Feb 17 '22 at 15:43

Caius Jard · Answer 3 · 2022-02-18T05:07:51.900

One answer is: before checking equality, sort them and after that easily can check equality but this is not the fastest way

That might not actually be true. The following is the current fastest approach here, about 20% faster than the other answer I posted and about twice as fast as IdleMind's approach (next quickest):

static bool SameChars(string str1, string str2){
    if(str1.Length != str2.Length) return false;

    var c1 = str1.ToCharArray();
    var c2 = str2.ToCharArray();
    
    Array.Sort(c1);
    Array.Sort(c2);

    for(int i = c1.Length - 1; i >= 0; i--)
      if(c1[i] != c2[i]) return false;
      
    return true;
}

..and it's simpler to understand than my other answer

score 1 · Answer 4 · edited Feb 17 '22 at 21:56

I did some timing comparisons, just for fun

using

a) NazaRN

  var areEqual = Enumerable.SequenceEqual(str1.OrderBy(c => c), str2.OrderBy(c => c));

b) IdleMind

bool stringsAreEqual = (str1.Length == str2.Length);
if (stringsAreEqual) {
    // see if each char from str1 is in str2, removing from str2 as we go
    List<char> str2chars = new List<char>(str2.ToCharArray());
    foreach (char c in str1.ToCharArray()) {
        int index = str2chars.IndexOf(c);
        if (index != -1) {
            str2chars.RemoveAt(index);
        }
        else {
            stringsAreEqual = false;
            break;
        }
    }
    // any chars left over in str2?
    stringsAreEqual = stringsAreEqual && (str2chars.Count == 0);
}

c) CaiusJard

static bool SameChars(string str1, string str2){

    if(str1.Length != str2.Length) return false;

    var counts = new int[26];
    var numPositive = 0;

    foreach(var c in str1){
      var r = ++counts[c - 'a'];
      if(r == 1) numPositive++;
    }

    foreach(var c in str2){
      var r = --counts[c - 'a'];
      if(r < 0) return false;
      if(r == 0) numPositive--;
    }

    return numPositive == 0;
}

d) MatthewWatson

var lookup1 = s1.ToLookup(ch => ch); // O(N)
var lookup2 = s2.ToLookup(ch => ch); // O(N)

return lookup1.All(keyValue => // O(N)
    lookup2.Contains(keyValue.Key) && keyValue.Count() == lookup2[keyValue.Key].Count());

Doing 1 million iterations (relative timings: smaller numbers means faster, primitive method here: https://dotnetfiddle.net/WjcPO3 )

a = 75
b = 16
c = 10
d = 100

We have a clear winner, but I would still go with a) for reasons @Heinzi stated in the comments:

"One answer is: before checking equality, sort them and after that easily can check equality but this is not the fastest way" - Indeed, but except in the rarest of circumstances, the speed difference simply won't matter. CPU time is cheap, developer time is expensive, so go for the simplest and most maintainable solution, not for the fastest one.

I swapped out c) for the other one I posted, as the one originally benched was wrong — Caius Jard, Feb 17 '22 at 21:22

score 0 · Answer 5 · answered Feb 17 '22 at 16:45

As others have pointed out, the easiest way is to sort then compare.

For academic interest, here's a way of doing it without sorting:

public static bool StringsContainSameChars(string s1, string s2)
{
    var lookup1 = s1.ToLookup(ch => ch); // O(N)
    var lookup2 = s2.ToLookup(ch => ch); // O(N)

    return lookup1.All(keyValue => // O(N)
        lookup2.Contains(keyValue.Key) && keyValue.Count() == lookup2[keyValue.Key].Count());
}

In theory this is an O(N) operation, but it would have a huge overhead and is likely to be slower than the obvious sorting approach. If N was very large, then this would in theory be faster.

Note: I'm saying this is O(N) because:

Enumerable.ToLookup() is an O(N) operation
Lookup<TKey,TElement>.Contains() is an O(1) operation.
The Lookup<TKey,TElement> indexer ([]) is an O(1) operation.
keyValue.Count() will be an O(1) operation even though it's using Linq.Count(), because the underlying type implements ICollection<T> which allows Linq to perform the shortcut of using the .Count property.

Therefore the overall complexity is O(N).

But like I say, this is really of academic interest only.

Caius Jard · Answer 6 · 2022-02-17T21:44:33.833

Here's an option that uses the curio of being able to treat chars as ints. It relies on the characters in the strings being a-z (but see the footnote if you want a wider range of chars). It's a simplistic form of dictionary where the character itself is used as an indexer, but it doesn't need any advanced technology at all - it purely relies on arrays and ints (and if you class foreach/enumerations of chars in a string as advanced they could be swapped for a normal for):

static bool SameChars(string str1, string str2){

    if(str1.Length != str2.Length) return false;

    var counts = new int[26];
    var numPositive = 0;

    foreach(var c in str1){
      var r = ++counts[c - 'a'];
      if(r == 1) numPositive++;
    }

    foreach(var c in str2){
      var r = --counts[c - 'a'];
      if(r < 0) return false;
      if(r == 0) numPositive--;
    }

    return numPositive == 0;
}

If the string lengths differ it's an easy win. Otherwise we'll use an array of 26 ints and index it by the character we're looking at, to keep a count of the number of times we see each character. Suppose we're iterating and we encounter our first letter a - the code ++counts[c - 'a'] will do 'a' - 'a' (which is 0) and increment array index [0] from 0 to 1.

The resulting counter, 1 is captured into r. We examine r and see if it's 1; the only time it will reasonably be 1 is the first time we see a given letter, so we increment numPositive which is a counter of the number of array indexes that are holding a positive number

We then loop over the second string, using the chars in it to decrement the counts. Again we capture the number now stored in that array position, and here we can make a small optimization; if any array index goes negative, then str2 must contain more of that character than str1 does, so early return false

If the resulting counter for a given letter has hit 0, decrement 1 from numPositive, the variable that is tracking the number of array positions that are holding positive numbers.

At the end of the operation we want to know if any of the counts elements are still holding a positive number; rather than checking each one, we can just look to see if numPositive is 0

_{Footnote: If you wanted to support a wider character range you'd need a bigger array (e.g. if you want to support A-Z and a-z you'll need an array that is 'z'-'A' long, and you'll need to do - 'A' in your math}

Edit:

Swapping the foreach out for for makes things about 10% quicker:

static bool SameCharsF(string str1, string str2){
    if(str1.Length != str2.Length) return false;

    var counts = new int[26];
    var numPositive = 0;

    for(int i = 0; i < str1.Length; i++){
      var r = ++counts[str1[i] - 'a'];
      if(r == 1) numPositive++;
    }

    for(int i = 0; i < str2.Length; i++){
      var r = --counts[str1[i] - 'a'];
      if(r < 0) return false;
      if(r == 0) numPositive--;
    }

    return numPositive == 0;
}

Here are a couple of variations that use Dictionary; they can cope with any number of any kind of char without special treatment but they're 2-3x slower. Surprisingly, the four loop version that preloads the dictionary so ContainsKey can be omitted isn't significantly slower than the ContainsKey version:

static bool SameCharsDP(string str1, string str2){
    if(str1.Length != str2.Length) return false;

    var d = new Dictionary<char, int>();
    var numPositive = 0;
    
    foreach(var c in str1)
      d[c] = 0;
    
    foreach(var c in str2)
      d[c] = 0;
    
    foreach(var c in str1){
      var r = ++d[c];
      if(r == 1) numPositive++;
    }
    
    foreach(var c in str2){
      var r = --d[c];
      if(r < 0) return false;
      if(r == 0) numPositive--;
    }

    return numPositive == 0;
}

static bool SameCharsDCK(string str1, string str2){
    if(str1.Length != str2.Length) return false;

    var d = new Dictionary<char, int>();
    var numPositive = 0;
    
    foreach(var c in str1){
        if(!d.ContainsKey(c))
            d[c]=0;
        var r = ++d[c];
        if(r == 1) numPositive++;
    }
    
    foreach(var c in str2){
        if(!d.ContainsKey(c))
            d[c]=0;
        var r = --d[c];
        if(r < 0) return false;
        if(r == 0) numPositive--;
    }

    return numPositive == 0;
}

How can compare two unsorted string in c#?

6 Answers6

Edit: