18

Hi I have two strings :

    String hear = "Hi My name is Deepak"
            + "\n"
            + "How are you ?"
            + "\n"
            + "\n"
            + "How is everyone";
    String dear = "Hi My name is Deepak"
            + "\n"
            + "How are you ?"
            + "\n"
            + "Hey there \n"
            + "How is everyone";

I want to get what is not present in the hear string that is "Hey There \n". I found a method , but it fails for this case :

static String strDiffChop(String s1, String s2) {
    if (s1.length() > s2.length()) {
        return s1.substring(s2.length() - 1);
    } else if (s2.length() > s1.length()) {
        return s2.substring(s1.length() - 1);
    } else {
        return "";
    }
}

Can any one help ?

N Deepak Prasath
  • 385
  • 2
  • 4
  • 13

9 Answers9

28

google-diff-match-patch

The Diff Match and Patch libraries offer robust algorithms to perform the operations required for synchronizing plain text.

Diff:

Compare two blocks of plain text and efficiently return a list of differences.

Match:

Given a search string, find its best fuzzy match in a block of plain text. Weighted for both accuracy and location.

Patch:

Apply a list of patches onto plain text. Use best-effort to apply patch even when the underlying text doesn't match.

Currently available in Java, JavaScript, Dart, C++, C#, Objective C, Lua and Python. Regardless of language, each library features the same API and the same functionality. All versions also have comprehensive test harnesses.

There is a Line or word diffs wiki page which describes how to do line-by-line diffs.

Mike Samuel
  • 118,113
  • 30
  • 216
  • 245
  • 2
    What an awesome lib. Thanks. – sebnukem Jul 01 '15 at 22:58
  • See https://stackoverflow.com/q/60386661/8887313 if you want to do it line-by-line (i.e., `diff` style). Because the Java version currently does not support it. – ATOMP Feb 25 '20 at 02:26
9

One can use the StringUtils from Apache Commons. Here is the StringUtils API.

public static String difference(String str1, String str2) {
    if (str1 == null) {
        return str2;
    }
    if (str2 == null) {
        return str1;
    }
    int at = indexOfDifference(str1, str2);
    if (at == -1) {
        return EMPTY;
    }
 return str2.substring(at);
}
public static int indexOfDifference(String str1, String str2) {
    if (str1 == str2) {
        return -1;
    }
    if (str1 == null || str2 == null) {
        return 0;
    }
    int i;
    for (i = 0; i < str1.length() && i < str2.length(); ++i) {
        if (str1.charAt(i) != str2.charAt(i)) {
            break;
        }
    }
    if (i < str2.length() || i < str1.length()) {
        return i;
    }
    return -1;
}
divinedragon
  • 5,105
  • 13
  • 50
  • 97
Fly
  • 810
  • 2
  • 9
  • 28
5

I have used the StringTokenizer to find the solution. Below is the code snippet

public static List<String> findNotMatching(String sourceStr, String anotherStr){
    StringTokenizer at = new StringTokenizer(sourceStr, " ");
    StringTokenizer bt = null;
    int i = 0, token_count = 0;
    String token = null;
    boolean flag = false;
    List<String> missingWords = new ArrayList<String>();
    while (at.hasMoreTokens()) {
        token = at.nextToken();
        bt = new StringTokenizer(anotherStr, " ");
        token_count = bt.countTokens();
        while (i < token_count) {
            String s = bt.nextToken();
            if (token.equals(s)) {
                flag = true;
                break;
            } else {
                flag = false;
            }
            i++;
        }
        i = 0;
        if (flag == false)
            missingWords.add(token);
    }
    return missingWords;
}
VJ THAKUR
  • 91
  • 2
  • 8
2

convert the string to lists and then use the following method to get result How to remove common values from two array list

Community
  • 1
  • 1
Aditya Rai
  • 139
  • 8
2

If you prefer not to use an external library, you can use the following Java snippet to efficiently compute the difference:

/**
 * Returns an array of size 2. The entries contain a minimal set of characters
 * that have to be removed from the corresponding input strings in order to
 * make the strings equal.
 */
public String[] difference(String a, String b) {
    return diffHelper(a, b, new HashMap<>());
}

private String[] diffHelper(String a, String b, Map<Long, String[]> lookup) {
    return lookup.computeIfAbsent(((long) a.length()) << 32 | b.length(), k -> {
        if (a.isEmpty() || b.isEmpty()) {
            return new String[]{a, b};
        } else if (a.charAt(0) == b.charAt(0)) {
            return diffHelper(a.substring(1), b.substring(1), lookup);
        } else {
            String[] aa = diffHelper(a.substring(1), b, lookup);
            String[] bb = diffHelper(a, b.substring(1), lookup);
            if (aa[0].length() + aa[1].length() < bb[0].length() + bb[1].length()) {
                return new String[]{a.charAt(0) + aa[0], aa[1]};
            } else {
                return new String[]{bb[0], b.charAt(0) + bb[1]};
            }
        }
    });
}

This approach is using dynamic programming. It tries all combinations in a brute force way but remembers already computed substrings and therefore runs in O(n^2).

Examples:

String hear = "Hi My name is Deepak"
        + "\n"
        + "How are you ?"
        + "\n"
        + "\n"
        + "How is everyone";
String dear = "Hi My name is Deepak"
        + "\n"
        + "How are you ?"
        + "\n"
        + "Hey there \n"
        + "How is everyone";
difference(hear, dear); // returns {"","Hey there "}

difference("Honda", "Hyundai"); // returns {"o","yui"}

difference("Toyota", "Coyote"); // returns {"Ta","Ce"}
jjoller
  • 661
  • 9
  • 17
  • 1
    So the keys are the lengths of the suffixes, which are unique and faster to compute than the hashes of the suffixes themselves ... nice. This is exactly the algorithm I was looking for (for a different language). – Jim Balter Jun 03 '19 at 04:22
0

You should use StringUtils from Apache Commons

String diff = StringUtils.difference( "Word", "World" );
System.out.println( "Difference: " + diff );


Difference: ld

Source: https://www.oreilly.com/library/view/jakarta-commons-cookbook/059600706X/ch02s15.html

gurbieta
  • 866
  • 1
  • 8
  • 22
  • Yes, I have heard about that a lot . – N Deepak Prasath Aug 21 '13 at 07:01
  • I downvote this answer since there is no indication on which StringUtils method does the job, no link to the documentation or code example. To anyone coming here later, just a quick example from the doc: `StringUtils.difference("ab", "abxyz") = "xyz"` – Carrm Dec 31 '18 at 16:19
0

I was looking for some solution but couldn't find the one i needed, so I created a utility class for comparing two version of text - new and old - and getting result text with changes between tags - [added] and [deleted]. It could be easily replaced with highlighter you choose instead of this tags, for example: a html tag. string-version-comparison

Any comments will be appreciated.

*it might not worked well with long text because of higher probability of finding same phrases as deleted.

0

My solution is for simple strings. You can extend it by tokenising lines from a paragraph.

It uses min Edit distance(recursion approach). You can use Dp if you would like.

import java.util.concurrent.atomic.AtomicInteger;

// A Naive recursive Java program to find minimum number
// operations to convert str1 to str2
class JoveoTest {
    static int min(int x, int y, int z)
    {
        if (x <= y && x <= z)
            return x;
        if (y <= x && y <= z)
            return y;
        else
            return z;
    }

    static int editDist(String str1, String str2, int m,
                        int n,StringBuilder str)
    {
        if (m == 0) {
            StringBuilder myStr1=new StringBuilder();
            myStr1.append("+"+str2);
            myStr1.reverse();
            str=myStr1;
            return n;
        }
        if (n == 0){
            StringBuilder myStr1=new StringBuilder();
            myStr1.append("-"+str1);
            myStr1.reverse();
            str=myStr1;
            return m;
        }
        if (str1.charAt(m - 1) == str2.charAt(n - 1))
            return editDist(str1, str2, m - 1, n - 1,str);
        
        StringBuilder myStr1=new StringBuilder();
        StringBuilder myStr2=new StringBuilder();
        StringBuilder myStr3=new StringBuilder();
        int insert= editDist(str1, str2, m, n - 1,myStr1);

        int remove=editDist(str1, str2, m - 1, n,myStr2);

        int replace=editDist(str1, str2, m - 1, n-1,myStr3);

        if(insert<remove&&insert<replace){
            myStr1.insert(0,str2.charAt(n-1)+"+");
            str.setLength(0);
            str.append(myStr1);
        }
        else if(remove<insert&&remove<replace){
            myStr2.insert(0,str2.charAt(m-1)+"-");
            str.setLength(0);
            str.append(myStr2);
        }
        else{
            myStr3.insert(0,str2.charAt(n-1)+"+"+str1.charAt(m-1)+"-");
            str.setLength(0);
            str.append(myStr3);
        }

        return 1+min(insert,remove,replace);

    }

    // Driver Code
    public static void main(String args[])
    {
        String str1 = "sunday";
        String str2 = "saturday";
        StringBuilder ans=new StringBuilder();
        System.out.println(editDist(
                str1, str2, str1.length(), str2.length(),ans ));
        System.out.println(ans.reverse().toString());
    }
}

3

+a+t-n+r

Freez
  • 53
  • 9
-1

what about this snippet ?

public static void strDiff(String hear, String dear){
    String[] hr = dear.split("\n");
    for (String h : hr) {
        if (!hear.contains(h)) {
            System.err.println(h);
        }
    }
}
N Deepak Prasath
  • 385
  • 2
  • 4
  • 13
  • @MikeSamuel has posted the correct solution. That does a true diff. `String.contains()` would fail if the text content being matched have its lines re-arranged. – Ravi K Thapliyal Aug 21 '13 at 07:16