2

For example if I have...

string a = "personil";
string b = "personal";

I would like to get...

string c = "person[i]l";

However it is not necessarily a single character. I could be like this too...

string a = "disfuncshunal";
string b = "dysfunctional";

For this case I would want to get...

string c = "d[isfuncshu]nal";

Another example would be... (Notice that the length of both words are different.)

string a = "parralele";
string b = "parallel";

string c = "par[ralele]";

Another example would be...

string a = "ato";
string b = "auto";

string c = "a[]to";

How would I go about doing this?

Edit: The length of the two strings can be different.

Edit: Added additional examples. Credit goes to user Nenad for asking.

Howard
  • 3,648
  • 13
  • 58
  • 86
  • 1
    It is really a complex question, start looking at string comparison methods http://stackoverflow.com/questions/1034622/how-can-i-measure-the-similarity-between-2-strings – RMalke Mar 08 '13 at 19:10
  • 2
    http://en.wikipedia.org/wiki/Longest_common_substring_problem – Joey Mar 08 '13 at 19:10
  • This would need more requirements. I don't understand why the second one would be formed the way it is. – Yuriy Faktorovich Mar 08 '13 at 19:11
  • Are you going to check the words of the same length? Wouldn't `d[i]sfunc[shu]nal` be more suitable for your second example? – J0HN Mar 08 '13 at 19:13
  • @YuriyFaktorovich he appears to want the string extremities that matches the original string out of the bracers. @ J0NH seems more reasonable to me too – RMalke Mar 08 '13 at 19:13
  • 5
    Yuriy: Quite easy, actually. First different character to last different character. – Joey Mar 08 '13 at 19:13
  • @YuriyFaktorovich: a is an incorrect spelling of a word. b is the correct spelling of the word. I'm trying to indicate from which area the user typed the word wrong. However the length of the word that the user typed can be different from the correct spelling of the word. – Howard Mar 08 '13 at 19:15
  • @rotaercz: parralele and parallel? What would you mark as difference? – Nenad Mar 08 '13 at 19:17
  • 2
    @Nenad: That would come out like this: par[ralele] – Howard Mar 08 '13 at 19:20
  • it _could_ be a complicated question if you have "dysunal" and "dysfunctional" and expect "dys[f]un[ction]al" – Meirion Hughes Mar 08 '13 at 19:24
  • 1
    I use to compare DNA sequences... We use dynamic programing to compare string of arbitrary length. – qPCR4vir Mar 08 '13 at 19:25
  • A bit late for this comment, but what have you tried? – antonijn Mar 08 '13 at 19:26
  • What should happen if the `b` string is missing a character? for instance comparing `disfuncshunal` and `dysfuncional` (notice the missing `t` in the `b` string there) – Lasse V. Karlsen Mar 08 '13 at 19:26
  • And what if you have ato and auto? correct one is longer? – Nenad Mar 08 '13 at 19:42
  • 1
    Take a look at Levenshtein distance or edit distance: http://en.wikipedia.org/wiki/Levenshtein_distance – qPCR4vir Mar 08 '13 at 19:44
  • @Antonijn: Yes, I've tried but the length of the two strings being of a variable length makes it hard. I'm hoping to gain some insight from all the smart people on stackoverflow. – Howard Mar 08 '13 at 19:44
  • @LasseV.Karlsen: b string will always be correct. – Howard Mar 08 '13 at 19:46
  • 1
    @Nenad: Added the example you asked about in the question. – Howard Mar 08 '13 at 19:48

5 Answers5

4

I must be very bored today, but I actually made UnitTest that pass all 4 cases (if you did not add some more in the meantime).

Edit: Added 2 edge cases and fix for them.

Edit2: letters that repeat multiple times (and error on those letters)

[Test]
[TestCase("parralele", "parallel", "par[ralele]")]
[TestCase("personil", "personal", "person[i]l")]
[TestCase("disfuncshunal", "dysfunctional", "d[isfuncshu]nal")]
[TestCase("ato", "auto", "a[]to")]
[TestCase("inactioned", "inaction", "inaction[ed]")]
[TestCase("refraction", "fraction", "[re]fraction")]
[TestCase("adiction", "ad[]diction", "ad[]iction")]
public void CompareStringsTest(string attempted, string correct, string expectedResult)
{
    int first = -1, last = -1;

    string result = null;
    int shorterLength = (attempted.Length < correct.Length ? attempted.Length : correct.Length);

    // First - [
    for (int i = 0; i < shorterLength; i++)
    {
        if (correct[i] != attempted[i])
        {
            first = i;
            break;
        }
    }

    // Last - ]
    var a = correct.Reverse().ToArray();
    var b = attempted.Reverse().ToArray();
    for (int i = 0; i < shorterLength; i++)
    {
        if (a[i] != b[i])
        {
            last = i;
            break;
        }
    }

    if (first == -1 && last == -1)
        result = attempted;
    else
    {
        var sb = new StringBuilder();
        if (first == -1)
            first = shorterLength;
        if (last == -1)
            last = shorterLength;
        // If same letter repeats multiple times (ex: addition)
        // and error is on that letter, we have to trim trail.
        if (first + last > shorterLength)
            last = shorterLength - first;

        if (first > 0)
            sb.Append(attempted.Substring(0, first));

        sb.Append("[");

        if (last > -1 && last + first < attempted.Length)
            sb.Append(attempted.Substring(first, attempted.Length - last - first));

        sb.Append("]");

        if (last > 0)
            sb.Append(attempted.Substring(attempted.Length - last, last));

        result = sb.ToString();
    }
    Assert.AreEqual(expectedResult, result);
}
Nenad
  • 24,809
  • 11
  • 75
  • 93
  • Could you check this one out? It crashes on this? string a = "inactioned"; string b = "inaction"; string c = "inaction[]"; – Howard Apr 19 '13 at 18:07
  • c should say this in the above... string c = "inaction[ed]"; – Howard Apr 19 '13 at 18:24
  • You're right. 2 edge cases. I added 2 'if' conditions. It's fine now. – Nenad Apr 20 '13 at 02:57
  • I found another edge case. "adiction", "addiction" shows "ad[]diction" when it should show "ad[]iction" – Howard Apr 22 '13 at 19:02
  • @rotaercz: I fixed it, but is that code hard to follow? If you use it in application, you should have some understanding of it. – Nenad Apr 23 '13 at 17:12
  • 1
    After looking at it, it's actually quite clean and easy to understand. I'll stop being a wuss. Thank you so much! :) – Howard Apr 24 '13 at 15:05
1

Have you tried my DiffLib?

With that library, and the following code (running in LINQPad):

void Main()
{
    string a = "disfuncshunal";
    string b = "dysfunctional";

    var diff = new Diff<char>(a, b);

    var result = new StringBuilder();
    int index1 = 0;
    int index2 = 0;
    foreach (var part in diff)
    {
        if (part.Equal)
            result.Append(a.Substring(index1, part.Length1));
        else
            result.Append("[" + a.Substring(index1, part.Length1) + "]");
        index1 += part.Length1;
        index2 += part.Length2;
    }
    result.ToString().Dump();
}

You get this output:

d[i]sfunc[shu]nal

To be honest I don't understand what this gives you, as you seem to completely ignore the changed parts in the b string, only dumping the relevant portions of the a string.

Lasse V. Karlsen
  • 380,855
  • 102
  • 628
  • 825
  • Note that changing the code so that you get a 1 combined sequence from the first fault to the last fault is not that hard. This code (+library) would also handle such things as added or missing characters easily. – Lasse V. Karlsen Mar 08 '13 at 19:31
  • The OP knows that the spelling of the second word is correct, so the only question is where does the first word mismatch. – Mike Perrenoud Mar 08 '13 at 19:36
0

Here is a complete and working console application that will work for both examples you gave:

using System;
using System.Collections.Generic;
using System.Linq;
using System.Text;
using System.Threading.Tasks;

namespace ConsoleApplication2
{
    class Program
    {
        static void Main(string[] args)
        {
            string a = "disfuncshunal";
            string b = "dysfunctional";

            StringBuilder sb = new StringBuilder();
            for (int i = 0; i < a.Length; i++)
            {
                if (a[i] != b[i])
                {
                    sb.Append("[");
                    sb.Append(a[i]);
                    sb.Append("]");

                    continue;
                }

                sb.Append(a[i]);
            }

            var str = sb.ToString();
            var startIndex = str.IndexOf("[");
            var endIndex = str.LastIndexOf("]");

            var start = str.Substring(0, startIndex + 1);
            var mid = str.Substring(startIndex + 1, endIndex - 1);
            var end = str.Substring(endIndex);

            Console.WriteLine(start + mid.Replace("[", "").Replace("]", "") + end);
        }
    }
}

it will not work if you want to display more than one entire section of the mismatched word.

Mike Perrenoud
  • 66,820
  • 29
  • 157
  • 232
0

You did not specify what to do if the strings were of different lengths, but here is a solution to the problem when the strings are of equal length:

private string Compare(string string1, string string2) {
            //This only works if the two strings are the same length..
            string output = "";
            bool mismatch = false;
            for (int i = 0; i < string1.Length; i++) {
                char c1 = string1[i];
                char c2 = string2[i];
                if (c1 == c2) {
                    if (mismatch) {
                        output += "]" + c1;
                        mismatch = false;
                    } else {
                        output += c1;
                    }
                } else {
                    if (mismatch) {
                        output += c1;
                    } else {
                        output += "[" + c1;
                        mismatch = true;
                    }
                }
            }
            return output;
        }
0

Not really good approach but as an exercise in using LINQ: task seem to be find matching prefix and suffix for 2 strings, return "prefix + [+ middle of first string + suffix.

So you can match prefix (Zip + TakeWhile(a==b)), than repeat the same for suffix by reversing both strings and reversing result.

var first = "disfuncshunal";
var second = "dysfunctional";

// Prefix
var zipped = first.ToCharArray().Zip(second.ToCharArray(), (f,s)=> new {f,s});
var prefix = string.Join("", 
    zipped.TakeWhile(c => c.f==c.s).Select(c => c.f));

// Suffix
var zippedReverse = first.ToCharArray().Reverse()
   .Zip(second.ToCharArray().Reverse(), (f,s)=> new {f,s});
var suffix = string.Join("", 
    zippedReverse.TakeWhile(c => c.f==c.s).Reverse().Select(c => c.f));

// Cut and combine.
var middle = first.Substring(prefix.Length,
      first.Length - prefix.Length - suffix.Length);
var result = prefix + "[" + middle + "]" + suffix;

Much easier and faster approach is to use 2 for loops (from start to end, and from end to start).

Alexei Levenkov
  • 98,904
  • 14
  • 127
  • 179