2

I'm attempting to calculate the difference between two strings

For example

string val1 = "Have a good day";
string val2 = "Have a very good day, Joe";

The result would be a list of string, with 2 items "very " and ", Joe"

So far my research into this task hasn't turned up much

Edit: The result would probably need to be 2 separate lists of strings, one that hold additions, and one that hold removals

mrb398
  • 1,277
  • 4
  • 24
  • 32
  • 3
    What do you mean "your research"? Did you write some code? Then share it. Did you not write anything? In that case, I don't think it's fair of you to just expect us to write your code for you. Make an effort! – mason Dec 02 '14 at 16:53
  • I've written code and done research. My code isn't even close to working as intended – mrb398 Dec 02 '14 at 16:54
  • See https://github.com/mmanela/diffplex – haim770 Dec 02 '14 at 16:54
  • 1
    Like I said *share your code* so we can point you in the right direction. – mason Dec 02 '14 at 16:55
  • 3
    This is a non-trivial task. See [here](http://stackoverflow.com/questions/24887238/how-to-compare-two-rich-text-box-contents-and-highlight-the-characters-that-are/24970638#24970638) for the use of a DIFF library! – TaW Dec 02 '14 at 17:08
  • Here's a [.NET Fiddle](https://dotnetfiddle.net/YPWaWz) that demonstrates how to do it using [DiffLib](https://difflib.codeplex.com/) (note: I'm the author of DiffLib) – Lasse V. Karlsen Dec 02 '14 at 17:39

5 Answers5

2

This is the simplest version I can think of:

class Program
{
    static void Main(string[] args)
    {
        string val1 = "Have a good day";
        string val2 = "Have a very good day, Joe";

        MatchCollection words1 = Regex.Matches(val1, @"\b(\w+)\b");
        MatchCollection words2 = Regex.Matches(val2, @"\b(\w+)\b");

        var hs1 = new HashSet<string>(words1.Cast<Match>().Select(m => m.Value));
        var hs2 = new HashSet<string>(words2.Cast<Match>().Select(m => m.Value));

        // Optionaly you can use a custom comparer for the words.
        // var hs2 = new HashSet<string>(words2.Cast<Match>().Select(m => m.Value), new MyComparer());

        // h2 contains after this operation only 'very' and 'Joe'
        hs2.ExceptWith(hs1); 

    }
}

custom comparer:

public class MyComparer : IEqualityComparer<string>
{
    public bool Equals(string one, string two)
    {
        return one.Equals(two, StringComparison.OrdinalIgnoreCase);
    }

    public int GetHashCode(string item)
    {
        return item.GetHashCode();
    }
}
Community
  • 1
  • 1
t3chb0t
  • 16,340
  • 13
  • 78
  • 118
1

Actually i followed this steps,

(i)Obtain all words from two words irrespective of special characters

(ii)From the two lists find the difference

CODE:

    string s2 = "Have a very good day, Joe";
    IEnumerable<string> diff;
    MatchCollection matches = Regex.Matches(s1, @"\b[\w']*\b");
    IEnumerable<string> first= from m in matches.Cast<Match>()
                where !string.IsNullOrEmpty(m.Value)
                select TrimSuffix(m.Value);
    MatchCollection matches1 = Regex.Matches(s2, @"\b[\w']*\b");
    IEnumerable<string> second = from m in matches1.Cast<Match>()
                                 where !string.IsNullOrEmpty(m.Value)
                                 select TrimSuffix(m.Value);

    if (second.Count() > first.Count())
    {
        diff = second.Except(first).ToList();
    }
    else
    {
        diff = first.Except(second).ToList();
    }
    }
   static string TrimSuffix(string word)
   {
    int apostropheLocation = word.IndexOf('\'');
    if (apostropheLocation != -1)
    {
        word = word.Substring(0, apostropheLocation);
    }
    return word;
   }

OUTPUT: very, Joe

Community
  • 1
  • 1
Sajeetharan
  • 216,225
  • 63
  • 350
  • 396
1

This code:

enum Where { None, First, Second, Both } // somewhere in your source file

//...
var val1 = "Have a good calm day calm calm calm";
var val2 = "Have a very good day, Joe Joe Joe Joe";

var words1 = from m in Regex.Matches(val1, "(\\w+)|(\\S+\\s+\\S+)").Cast<Match>()
                where m.Success
                select m.Value.ToLower();
var words2 = from m in Regex.Matches(val2, "(\\w+)|(\\S+\\s+\\S+)").Cast<Match>()
                where m.Success
                select m.Value.ToLower();

var dic = new Dictionary<string, Where>();
foreach (var s in words1)
{
    dic[s] = Where.First;
}
foreach (var s in words2)
{
    Where b;
    if (!dic.TryGetValue(s, out b)) b = Where.None;

    switch (b)
    {
        case Where.None:
            dic[s] = Where.Second;
            break;
        case Where.First:
            dic[s] = Where.Both;
            break;
    }
}

foreach (var kv in dic.Where(x => x.Value != Where.Both))
{
    Console.WriteLine(kv.Key);
}

Gives us 'calm', 'very', ', Joe' and 'Joe' which are differences from both strings; 'calm' from the first one and 'very', ', Joe' and 'Joe' from the next one. It also removes repeated cases.

And to get two separate lists that shows us which word came from which text:

var list1 = dic.Where(x => x.Value == Where.First).ToList();
var list2 = dic.Where(x => x.Value == Where.Second).ToList();

foreach (var kv in list1)
{
    Console.WriteLine("{0}: {1}", kv.Key, kv.Value);
}

foreach (var kv in list2)
{
    Console.WriteLine("{0}: {1}", kv.Key, kv.Value);
}
Kaveh Shahbazian
  • 13,088
  • 13
  • 80
  • 139
0

Put the characters into two sets then compute the relative compliment of those sets.

The relative compliment will be available in any good set library.

You might want to take care to preserve the order of the characters.

P45 Imminent
  • 8,319
  • 4
  • 35
  • 78
-1

you have to remove the ',' in order to get the expected result

  string s1 = "Have a good day";
        string s2 = "Have a very good day, Joe";
        int index = s2.IndexOf(','); <----- get the index of the char to be removed
        IEnumerable<string> diff;
        IEnumerable<string> first = s1.Split(' ').Distinct();
        IEnumerable<string> second = s2.Remove(index, 1).Split(' ').Distinct();<--- remove it
        if (second.Count() > first.Count())
        {
            diff = second.Except(first).ToList();
        }
        else
        {
            diff = first.Except(second).ToList();
        }
SuncoastOwner
  • 263
  • 2
  • 9