5

A friend asked me how to improve some code with LINQ. How would you do a character by character comparison between two strings to count the number of matches at an index? Here's the original code, could it be improved with LINQ?

private int Fitness(string individual, string target)
   {
       int sum = 0;
       for (int i = 0; i < individual.Length; i++)
           if (individual[i] == target[i]) sum++;
       return sum;
   }
Jon Galloway
  • 52,327
  • 25
  • 125
  • 193

5 Answers5

7
return Enumerable.Range(0, individual.Length)
                 .Count(i => individual[i] == target[i]);

A more fool-proof way would be (the above snippet will fail if target is shorter than individual):

return Enumerable.Range(0, Math.Min(individual.Length, target.Length))
                 .Count(i => individual[i] == target[i]);

I believe the code is correct as is. Enumerable.Range method takes two arguments. The first of which is the start index (should be 0), the second is the count of items. The complete code snippet to test and make sure:

class Program {
  static void Main(string[] args) {
      Console.WriteLine(Fitness("hello", "world"));
  }
  static int Fitness(string individual, string target) {
      return Enumerable.Range(0, Math.Min(individual.Length, target.Length))
                       .Count(i => individual[i] == target[i]);
  }
}
Mehrdad Afshari
  • 414,610
  • 91
  • 852
  • 789
2

You could write something similar with LINQ, but since "Zip" isn't built in until .NET 4.0 it will be more code than we'd like, and/or not as efficient. I'd be tempted to leave it "as is", but I'd probably check target.Length to avoid an out-of-range exception.

Perhaps I'd make an extension method, though:

public static int CompareFitness(this string individual, string target)
{
   int sum = 0, len = individual.Length < target.Length
        ? individual.Length : target.Length;
   for (int i = 0; i < len; i++)
       if (individual[i] == target[i]) sum++;
   return sum;
}

Then you can use:

string s = "abcd";
int i = s.CompareFitness("adc"); // 2
Marc Gravell
  • 1,026,079
  • 266
  • 2,566
  • 2,900
  • Agreed on length check. How it's related to Zip though? – Mehrdad Afshari Aug 16 '09 at 21:18
  • 1
    Zip takes two sequences and brings them together term-by-term...? – Marc Gravell Aug 16 '09 at 21:19
  • Agree, Marc, but I'm accepting Mehrdad's answer because I wanted to see how this would be done in LINQ. Voted for your answer, too. Thanks! – Jon Galloway Aug 16 '09 at 21:22
  • I understand but how can `Zip` reduce code size in this case (relative to my answer)? At least for collections that support direct access with an index, it won't make much difference in code size. – Mehrdad Afshari Aug 16 '09 at 21:24
  • I don't have the 4.0 version of Zip to hand, but wouldn't it be something like `individual.Zip(target).Count(pair=>pair.X==pair.Y);` - but importantly we haven't had to mess with the Min (i.e. this replaces your **second** version) – Marc Gravell Aug 16 '09 at 21:29
0

The currently selected answer doesn't handle different length strings well. It only compares a maximum substring of the input strings if the lengths differ.

For example:

Fitness("ABC", "ABC") -> 3
Fitness("ABC", "ABC this is an 'unfit' string") -> 3

This can be easily fixed by subtracting the differing lengths from your score. The improvement:

return Enumerable.Range(0, Math.Min(individual.Length, target.Length))
                 .Count(i => individual[i] == target[i])
                 - Math.Abs(individual.Length - target.Length);

Now:

Fitness("ABC", "ABC") -> 3
Fitness("ABC", "ABC this is an 'unfit' string") -> -23

Technically, there are 23 characters difference between those two inputs.

jamesrom
  • 866
  • 3
  • 10
  • 19
0

I am a little puzzled. The PO asked to improve the solution with Linq. Benchmarking the solutions I received the following result in the table (code see below). Unless I am missing something (in which case I'd appreciate comments) I cannot see that Linq improves the simple solution with the for-loop.

Result: the for loop is a lot faster and needs less memory.

Consequence: I would keep the simple for-loop. The argument of Conciseness that favors Linq does not apply here because the code is hidden away in a method. And the performance is not better but worse.

I like Linq. Tried to work without it, recently, and failed. But it is not always better.

Method Mean Error StdDev Gen 0 Allocated
RunLinqZip 1,179.16 ns 2.362 ns 1.844 ns 0.1831 1,152 B
RunLinqRange 556.54 ns 1.359 ns 1.135 ns 0.1726 1,088 B
RunForLoop 63.84 ns 0.101 ns 0.094 ns - -
using BenchmarkDotNet.Attributes;
using BenchmarkDotNet.Running;


namespace Benchmarks
{
    public class Program
    {
        static void Main(string[] args)
        {
            var summary = BenchmarkRunner.Run<MemoryBenchmarkerDemo>();
        }
    }


    [MemoryDiagnoser]
    public class MemoryBenchmarkerDemo
    {
        public string[] Id = new string[] { "ATTR", "TAL", "SPELL", "LITURGY", "REGENERATE", "INI", "CT_9", "CT_9/ITEMTEMPLATE_231" };
        public string[] Att = new string[] { "ATTR_2", "TAL_99", "SPELL_232", "LITURGY_123", "REGENERATE", "INI", "CT_9", "CT_9/ITEMTEMPLATE_231" };


        public int FitnessFor(string individual, string target)
        {
            int sum = 0;
            for (int i = 0; i < Math.Min(individual.Length, target.Length); i++)
                if (individual[i] == target[i]) sum++;
            return sum;
        }

        public int FitnessRange(string individual, string target)
        {
            return Enumerable.Range(0, individual.Length)
                     .Count(i => individual[i] == target[i]);
        }

        public int FitnessZip(string individual, string target)
        {
            return individual.Zip(target).Count(pair => pair.Item1 == pair.Item2);
        }



        [Benchmark]
        public void RunLinqZip()
        {
            for(int i = 0; i< Id.Length; i++)
            {
                FitnessZip(Id[i], Att[i]);
            }
        }

        [Benchmark]
        public void RunLinqRange()
        {
            for (int i = 0; i < Id.Length; i++)
            {
                FitnessRange(Id[i], Att[i]);
            }
        }


        [Benchmark]
        public void RunForLoop()
        {
            for (int i = 0; i < Id.Length; i++)
            {
                FitnessFor(Id[i], Att[i]);
            }
        }

    }
}
Jan
  • 4,974
  • 3
  • 26
  • 43
  • 1
    Well, nobody mentioned performance as point of improvement. What does "improved with LINQ" mean anyway? Nowadays the question would probably get closed as too vague. The check on lengths in the accepted answer is one improvement in robustness, which is probably why it was accepted. – Gert Arnold Jan 07 '22 at 18:56
-1

How about a join, or did I misunderstand?

    static void Main(string[] args)
    {
        var s1 = "abcde";
        var s2 = "hycdh";
        var count = s1.Join(s2, c => c, c => c, (a,b) => a).Count();
        Console.WriteLine(count);
        Console.ReadKey();
    }
spender
  • 117,338
  • 33
  • 229
  • 351