1

So I have a text box and on the text changed event I have the old text and the new text, and want to get the difference between them. In this case, I want to be able to recreate the new text with the old text using one remove function and one insert function. That is possible because there are a few possibilities of the change that was in the text box:

  1. Text was only removed (one character or more using selection) - ABCD -> AD
  2. Text was only added (one character or more using paste) - ABCD -> ABXXCD
  3. Text was removed and added (by selecting text and entering text in the same action) - ABCD -> AXD

So I want to have these functions:

Sequence GetRemovedCharacters(string oldText, string newText)
{

}
Sequence GetAddedCharacters(string oldText, string newText)
{

}

My Sequence class:

public class Sequence
{

    private int start;
    private int end;

    public Sequence(int start, int end)
    {
        StartIndex = start; EndIndex = end;
    }

    public int StartIndex { get { return start; } set { start = value; Length = end - start + 1; } }
    public int EndIndex { get { return end; } set { end = value; Length = end - start + 1; } }
    public int Length { get; private set; }

    public override string ToString()
    {
        return "(" + StartIndex + ", " + EndIndex + ")";
    }

    public static bool operator ==(Sequence a, Sequence b)
    {
        if(IsNull(a) && IsNull(b))
            return true;
        else if(IsNull(a) || IsNull(b))
            return false;
        else
            return a.StartIndex == b.StartIndex && a.EndIndex == b.EndIndex;
    }
    public override bool Equals(object obj)
    {
        return base.Equals(obj);
    }
    public static bool operator !=(Sequence a, Sequence b)
    {
        if(IsNull(a) && IsNull(b))
            return false;
        else if(IsNull(a) || IsNull(b))
            return true;
        else
            return a.StartIndex != b.StartIndex && a.EndIndex != b.EndIndex;
    }
    public override int GetHashCode()
    {
        return base.GetHashCode();
    }

    static bool IsNull(Sequence sequence)
    {
        try
        {
            return sequence.Equals(null);
        }
        catch(NullReferenceException)
        {
            return true;
        }
    }

}

Extra Explanation: I want to know which characters were removed and which characters were added to the text in order to get the new text so I can recreate this. Let's say I have ABCD -> AXD. 'B' and 'C' would be the characters that were removed and 'X' would be the character that was added. So the output from the GetRemovedCharacters function would be (1, 2) and the output from the GetAddedCharacters function would be (1, 1). The output from the GetRemovedCharacters function refers to indexes in the old text and the output from the GetAddedCharacters function refers to indexes in the old text after removing the removed characters.

EDIT: I've thought of a few directions:

  1. This code I created* which returns the sequence that was affected - if characters were removed it returns the sequence of the characters that were removed in the old text; if characters were added it returns the sequence of the characters that were added in the new text. It does not return the right value (which I myself not sure what I want it to be) when removing and adding text.
  2. Maybe the SelectionStart property in the text box could help - the position of the caret after the text was changed.

*

private static Sequence GetChangeSequence(string oldText, string newText)
{
    if(newText.Length > oldText.Length)
    {
        for(int i = 0; i < newText.Length; i++)
            if(i == oldText.Length || newText[i] != oldText[i])
                return new Sequence(i, i + (newText.Length - oldText.Length) - 1);
        return null;
    }
    else if(newText.Length < oldText.Length)
    {
        for(int i = 0; i < oldText.Length; i++)
            if(i == newText.Length || oldText[i] != newText[i])
                return new Sequence(i, i + (oldText.Length - newText.Length) - 1);
        return null;
    }
    else
        return null;
}

Thanks.

  • What is the purpose of the `IsNull` function? Use `ReferenceEqual(a, null)` [or `a is null` in c# 7] if your'e trying to avoid the stack overflow that `==null` causes within `operator==`. Don't rely on throwing a NRE like that. Also it's usually best to define `!=` in terms of `==` (and `==` in terms of `Equals`, though if you *actually* override Equals, make sure you're GetHashCode reflects that) – pinkfloydx33 Jul 03 '18 at 00:40
  • I'm afraid you are seriously underestimating the problem. Look up 'diff' for more. Maybe you could use the KeyPress etc events and monitoring the clipboard to cut down on the whole issue.. Textchanged alone will leave you with the whole undiluted diff challenge. See [here](https://stackoverflow.com/questions/24887238/how-to-compare-two-rich-text-box-contents-and-highlight-the-characters-that-are/24970638?s=1|26.9374#24970638) for an example of the problems.. – TaW Jul 03 '18 at 08:30
  • Is your task to learn something? Reimplement something? Or work out the difference(s)? I ask because I have a nuget package just for this task, [difflib](https://www.nuget.org/packages/difflib/2017.7.26.1241), source code on [github](https://github.com/lassevk/DiffLib). To get the differences you could just do `Diff.CalculateSections(s1.ToCharArray(), s2.ToCharArray())` and inspect the results, and work out exactly what happened. Now, answering your question in terms of those 3 *specific* types of changes can be done, but helping you implement full diff is too broad for Stack Overflow. – Lasse V. Karlsen Jul 03 '18 at 19:25
  • Are those 3 combinations of changes the *only* things that can happen? You don't have `ABCDE --> AXXBDE` where `XX` was added and `D` was removed, but not at the same place in the strings? – Lasse V. Karlsen Jul 03 '18 at 19:35
  • Simple trim both strings all that is common at both ends, what up are left with is the bits that changed. – Lasse V. Karlsen Jul 03 '18 at 19:44
  • thanks, but I managed to get to a solution and posted it here –  Jul 04 '18 at 10:40

3 Answers3

0

A simple string comparison wont do the job since you are asking for a algorithm which supports added and removed chars at the same time and is hence not easy to achive in a few lines of code. Id suggest to use a library instead of writing your own comparison algorithm.

Have a look at this project for example.

Blightbuster
  • 481
  • 7
  • 16
  • that project is too much text file based, and I don't get where the actual algorithm I need to use is in. glad if you'd point me to where that is or maybe it's just not suitable for me –  Jul 02 '18 at 23:26
0

I quickly threw this together to give you an idea of what I did to solve your question. It doesn't use your classes but it does find an index so it's customizable for you. There are also obvious limitations to this as it is just bare bones.

This method will spot out changes made to the original string by comparing it to the changed string

// Find the changes made to a string
string StringDiff (string originalString, string changedString)
{
    string diffString = "";

    // Iterate over the original string
    for (int i = 0; i < originalString.Length; i++)
    {
        // Get the character to search with
        char diffChar = originalString[i];

        // If found char in the changed string
        if (FindInString(diffChar, changedString, out int index))
        {
            // Remove from the changed string at the index as we don't want to match to this char again
            changedString = changedString.Remove(index, 1);
        }
        // If not found then this is a difference
        else
        {
            // Add to diff string
            diffString += diffChar;
        }
    }

    return diffString;
}

This method will return true at the first matching occurrence (an obvious limitation but this is more to give you an idea)

// Find char at first occurence in string
bool FindInString (char c, string search, out int index)
{
    index = -1;

    // Iterate over search string
    for (int i = 0; i < search.Length; i++)
    {
        // If found then return true with index
        if (c == search[i])
        {
            index = i;
            return true;
        }
    }

    return false;
}

This is a simple helper method to show you an example

void SplitStrings(string oldStr, string newStr)
{
    Console.WriteLine($"Old : {oldStr}, New: {newStr}");
    Console.WriteLine("Removed - " + StringDiff(oldStr, newStr));
    Console.WriteLine("Added - " + StringDiff(newStr, oldStr));
}
C. Carter
  • 291
  • 2
  • 11
  • Thanks but what I meant by remove and add isn't the difference of characters between the two strings. –  Jul 03 '18 at 09:36
0

I've done it.

static void Main(string[] args)
{

    while(true)
    {
        Console.WriteLine("Enter the Old Text");
        string oldText = Console.ReadLine();
        Console.WriteLine("Enter the New Text");
        string newText = Console.ReadLine();
        Console.WriteLine("Enter the Caret Position");
        int caretPos = int.Parse(Console.ReadLine());
        Sequence removed = GetRemovedCharacters(oldText, newText, caretPos);
        if(removed != null)
            oldText = oldText.Remove(removed.StartIndex, removed.Length);
        Sequence added = GetAddedCharacters(oldText, newText, caretPos);
        if(added != null)
            oldText = oldText.Insert(added.StartIndex, newText.Substring(added.StartIndex, added.Length));
        Console.WriteLine("Worked: " + (oldText == newText).ToString());
        Console.ReadKey();
        Console.Clear();
    }

}

static Sequence GetRemovedCharacters(string oldText, string newText, int caretPosition)
{
    int startIndex = GetStartIndex(oldText, newText);
    if(startIndex != -1)
    {
        Sequence sequence = new Sequence(startIndex, caretPosition + (oldText.Length - newText.Length) - 1);
        if(SequenceValid(sequence))
            return sequence;
    }
    return null;
}
static Sequence GetAddedCharacters(string oldText, string newText, int caretPosition)
{
    int startIndex = GetStartIndex(oldText, newText);
    if(startIndex != -1)
    {
        Sequence sequence = new Sequence(GetStartIndex(oldText, newText), caretPosition - 1);
        if(SequenceValid(sequence))
            return sequence;
    }
    return null;
}
static int GetStartIndex(string oldText, string newText)
{
    for(int i = 0; i < Math.Max(oldText.Length, newText.Length); i++)
        if(i >= oldText.Length || i >= newText.Length || oldText[i] != newText[i])
            return i;
    return -1;
}
static bool SequenceValid(Sequence sequence)
{
    return sequence.StartIndex >= 0 && sequence.EndIndex >= 0 && sequence.EndIndex >= sequence.StartIndex;
}