Find number of characters mutual between two strings in C#

Question

I am looking for a method that will take two strings and return the number of characters that are common to both e.g.:

"G010" & "G1820A" should return 3 as the G, 0 and 1 chars exist in both.

If a char exists twice in both they should be counted separately as follows:

"G12AA" & "GAA2" should return 4 as the G, A, A and 2 characters exist in both.

Any help with this? Google searches haven't been too helpful thus far.

I have a method that compares two strings for similarities but with the chars in the same indexes, this however is completely different. — Tommy, Feb 13 '14 at 09:00
"G12AA" & "GA2" should return 4? and "GA2" & "G12AA" should return 4? — Francesco Milani, Feb 13 '14 at 09:22

Jodrell · Answer 1 · 2014-02-14T10:02:01.563

Okay, how about this, it has the advantage of maximising lazy evaluation and minimising string manipulation.

public int CommonChars(string left, string right)
{
    return left.GroupBy(c => c)
        .Join(
            right.GroupBy(c => c),
            g => g.Key,
            g => g.Key,
            (lg, rg) => lg.Zip(rg, (l, r) => l).Count())
        .Sum(); 
}

essentialy, it groups each side by char, then finds chars which have a group on both sides. The matched groups are counted in tandem, until either runs out. These counts are summed to produce the result.

It would be trivial to perform this generically for any two sequences. See below,

public static int CommomCount<T>(
        this IEnumerable<T> source,
        IEnumerable<T> sequence,
        IEqualityComparer<T> comparer = null)
{
    if (sequence == null)
    {
        return 0;
    }

    if (comparer == null)
    {
        comparer = EqualityComparer<T>.Default;
    }

    return source.GroupBy(t => t, comparer)
        .Join(
            sequence.GroupBy(t => t, comparer),
            g => g.Key,
            g => g.Key,
            (lg, rg) => lg.Zip(rg, (l, r) => l).Count(),
            comparer)
        .Sum();
}

Which you would use like this.

"G12AA".CommonCount("GAA2")

The optional comparer parameter may prove useful if you require case insensitivity or other special treatment.

In the interest of resuability, I'd be tempted to remove the Sum() and return an IEnumerable<T>, and then add sum to the call, like this,

public static IEnumerable<T> Commom<T>(
        this IEnumerable<T> source,
        IEnumerable<T> sequence,
        IEqualityComparer<T> comparer = null)
{
    if (sequence == null)
    {
        return Enumerable.Empty<T>();
    }

    if (comparer == null)
    {
        comparer = EqualityComparer<T>.Default;
    }

    return source.GroupBy(t => t, comparer)
        .Join(
            sequence.GroupBy(t => t, comparer),
            g => g.Key,
            g => g.Key,
            (lg, rg) => lg.Zip(rg, (l, r) => l),
            comparer)
        .SelectMany(g => g);
}

so you could easily do

Console.WriteLine(new string("G12AA".Common("GAA2").ToArray()));

or just the orgininal

"G12AA".Common("GAA2").Count();

Matt W · Answer 2 · 2014-02-13T09:58:24.623

4

Try this

    public int CommonCharacters(string s1, string s2)
    {
        bool[] matchedFlag = new bool[s2.Length];

        for (int i1 = 0; i1 < s1.Length; i1++)
        {
            for (int i2 = 0; i2 < s2.Length; i2++)
            {
                if (!matchedFlag[i2] && s1.ToCharArray()[i1] == s2.ToCharArray()[i2])
                {
                    matchedFlag[i2] = true;
                    break;
                }
            }
        }

        return matchedFlag.Count(u => u);
    }

edited Feb 13 '14 at 09:58

answered Feb 13 '14 at 09:14

Matt W

323
3
14

1

reverse the order of the statements in your condition to improve efficiency: no point doing all that extra work changing strings to arrays when the flag means it won't be used anyway! – Nathan Feb 13 '14 at 09:37
1

I meant change the order of the conditions: if (s1[i1] == s2[i2] && !matchedFlag[i2]): more efficient to check the boolean first: change to if (!matchedFlag[i2] && s1[i1] == s2[i2]) - see "short-circuiting" :) (questionable how much of a difference this will make now that you're correctly using the string indexer on the first condition - would have to be a very large data set - but I always make a point to put the simplest condition at the beginning of an if statement which uses &&: can make a substantial difference, especially inside multiple loops) – Nathan Feb 13 '14 at 09:56
Aha my bad, feel like I've not fully gotten out of bed this morning. – Matt W Feb 13 '14 at 09:58
hehe, no worries - now lose the tochararray calls - they're superflous too ;) – Nathan Feb 13 '14 at 10:00
If you're talking about "G12A" and "GAA2", it returns 3. G, A and 2. – Matt W Feb 13 '14 at 10:13
ok, sorry - must have mixed up the inputs in my test code - guess I'm asleep too :) – Nathan Feb 13 '14 at 10:14
Even better - not mixed up input, mixed up outputs - copy paste fail lol string a = "G12A"; string b = "GAA2"; var result = a.CommonCount(b); Console.WriteLine("mine: " + result); var result2 = StringExtensions.CommonCharacters(a, b); Console.WriteLine("other: " + result); – Nathan Feb 13 '14 at 10:16

score 1 · Answer 3 · edited May 23 '17 at 12:08

You could use Linq to solve this problem by using something like this:

static void Main(string[] args)
{
    IEnumerable<char> a = "G010".ToCharArray();
    IEnumerable<char> b = "G1820A".ToCharArray();

    int commonChars = FindCommonElements(a, b).Count();
    Console.WriteLine(commonChars);

    Console.ReadLine();
}

private static T[] FindCommonElements<T>(IEnumerable<T> source, IEnumerable<T> target)
{
    ILookup<T, T> lookup2 = target.ToLookup(i => i);

    return (
      from group1 in source.GroupBy(i => i)
      let group2 = lookup2[group1.Key]
      from i in (group1.Count() < group2.Count() ? group1 : group2)
      select i
    ).ToArray();
}

commonChars will have a value of 3. The FindCommonElements method was inspired by this question: How do I do an integer list intersection while keeping duplicates?

You can use `Zip`, as in my answer to avoid fully counting both groups. http://stackoverflow.com/a/21751976/659190 — Jodrell, Feb 14 '14 at 10:10

Amit · Answer 4 · 2014-02-13T10:16:38.523

1

        string s1 = "G12A";
        string s2 = "GAA2";
        List<char> lst1 = s1.ToList();
        List<char> lst2 = s2.ToList();
        int count = 0;
        foreach (char c in lst2)
        {
            if (lst1.Contains(c))
            {
                lst1.Remove(c);
                count++;
            }
        }
        Console.WriteLine(count);

edited Feb 13 '14 at 10:16

answered Feb 13 '14 at 09:38

Amit

11
2

What do you get with `string s1 = "G12A";`? – L.B Feb 13 '14 at 09:41
How 4 is wrong when s2 = "GAA2" all 4 characters present in s1? – Amit Feb 13 '14 at 09:48
@Amit but they both have a single 'A' in common not two. See Tommy's first example – L.B Feb 13 '14 at 09:50
Thanks @L.B for telling scenario of duplicate characters – Amit Feb 13 '14 at 10:16

score 1 · Answer 5 · answered Feb 13 '14 at 10:18

Doing it with Linq:

    int MyCount(string s1, string s2)
    {
        return s1.Count(c =>
                            {
                                var i = s2.IndexOf(c);
                                if (i >= 0)
                                {
                                    s2 = s2.Remove(i, 1);
                                    return true;
                                }
                                return false;
                            });
    }

score 0 · Answer 6 · answered Feb 13 '14 at 09:53

This one would run faster with larger inputs as it doesn't do nesting loops but rather depends on hashed search using the Dictionary. On the other hand it uses more memory.

 public int CommonCharacterCount(string s1, string s2)
            { 
                var r=0;
                Dictionary<char,int> s2Dict = new Dictionary<char,int>();
                foreach (var ch in s2)
                {
                    if (s2Dict.ContainsKey(ch))
                        s2Dict[ch] = s2Dict[ch]+1;
                    else s2Dict.Add(ch,1);
                }

                foreach (var c in s1)
                {
                    if (s2Dict.ContainsKey(c) && s2Dict[c]>0)
                    {
                        r++;
                        s2Dict[c] = s2Dict[c] - 1;
                    }
                }
                return r;
            }

score 0 · Answer 7 · answered Nov 22 '17 at 13:54

string myname = "1234";
        string yourname = "12";
        char[] sam = new char[] { };
        sam = myname.ToCharArray();
        char[] sam1 = new char[] { };
        sam1 = yourname.ToCharArray();
        int id = 0;
        int id1 = 0;
        List<string> found = new List<string>();
        List<string> found1 = new List<string>();
        foreach (char item in sam)
        {
            if (found.Contains(item.ToString()))
            {
                found.Add(item.ToString() + id);
                id++;
            }
            else
                found.Add(item.ToString());
        }
        foreach (var item in sam1)
        {
            if (found1.Contains(item.ToString()))
            {
                found1.Add(item.ToString() + id);
                id1++;
            }
            else
                found1.Add(item.ToString());
        }
        var final = found.Except(found1);
        var final2 = found1.Except(found);
        var checkingCount = final.Count() + final2.Count();
        Console.Write(checkingCount);
        Console.ReadLine();

check this out, btw not efficient. But got it right.

score 0 · Answer 8 · answered Nov 03 '21 at 22:43

        string s1 = "aabcc";
        string s2 = "adcaa";
        int x = 0;

        var s1list = s1.ToList();
        var s2list = s2.ToList();
        for (int i=0; i<s1list.Count; i++)
        {
            var check = s1list[i];
            if (s2list.Contains(s1list[i]))
            {
                x++;
                var indexval = s2list.FindIndex(a => a == s1list[i]);
                s2list.RemoveAt(indexval);
            }
        }
        Console.WriteLine(x);

rt2800 · Answer 9 · 2014-02-13T09:38:21.627

-3

Please check following code--> src is first string while chk is second string

var count = 0;var i=0; src.ToList().ForEach((x)=> {
while(chk.Substring(i).IndexOf(x) >= 0) {
count++; i++; if( i > chk.Length) break; }
});

edited Feb 13 '14 at 09:38

answered Feb 13 '14 at 09:06

rt2800

3,045
2
19
26

1

What about `AA` and `A`? – L.B Feb 13 '14 at 09:09
3

Still don't know how to format an answer after 194 answers? – L.B Feb 13 '14 at 09:30
integer i is set to 0(zero) – rt2800 Feb 13 '14 at 09:37

Find number of characters mutual between two strings in C#

9 Answers9

Linked