5

I would like to learn a way to obtain the portions where two strings differ.

Suppose I have these two strings:

String s1 = "x4.printString(\"Bianca.()\").y1();";
String s2 = "sb.printString(\"Bianca.()\").length();";

I would like this output: ["x4", "y1", "sb", "length"] coming from a method receiving s1and s2as arguments.

I have looked for something like this in other posts, but I have only found links to StringUtils.difference(String first, String second).

But this method returns the second string from the index where it begins to differ from the first one.
I really don't know where to start and any advice would be very appreciated.

UPDATE Following @aUserHimself advises, I managed to obtain all common subsequences among the two strings, but these subsequences come out like a unique String.

This is my code now:

private static int[][] lcs(String s, String t) {
    int m, n;
    m = s.length();
    n = t.length();
    int[][] table = new int[m+1][n+1];
    for (int i=0; i < m+1; i++)
        for (int j=0; j<n+1; j++)
            table[i][j] = 0;
    for (int i = 1; i < m+1; i++)
        for (int j = 1; j < n+1; j++)
            if (s.charAt(i-1) == t.charAt(j-1))
                table[i][j] = table[i-1][j-1] + 1;
            else
                table[i][j] = Math.max(table[i][j-1], table[i-1][j]);
    return table;
}

private static List<String> backTrackAll(int[][]table, String s, String t, int m, int n){
    List<String> result = new ArrayList<>();
    if (m == 0 || n == 0) {
        result.add("");
        return result;
    }
    else
        if (s.charAt(m-1) == t.charAt(n-1)) {
            for (String sub : backTrackAll(table, s, t, m - 1, n - 1))
                result.add(sub + s.charAt(m - 1));
            return result;
        }
        else {
            if (table[m][n - 1] >= table[m - 1][n])
                result.addAll(backTrackAll(table, s, t, m, n - 1));
            else
                result.addAll(backTrackAll(table, s, t, m - 1, n));
            return result;
        }
}

private List<String> getAllSubsequences(String s, String t){
    return backTrackAll(lcs(s, t), s, t, s.length(), t.length());
}



Calling getAllSubsequences on these two strings:

String s1 = "while (x1 < 5)"
String s2 = "while (j < 5)"

I receive this string: ["while ( < 5)"] not ["while (", " < 5)"] as I would like to obtain. I am not understanding where I am doing wrong.

delca85
  • 1,176
  • 2
  • 10
  • 17
  • 1
    @downvoter: why downvote? Because of a not optimal formatting...? – delca85 Mar 29 '17 at 11:45
  • 2
    Don't understand the close votes either, the question is quite clear. – Tobb Mar 29 '17 at 11:45
  • Possible duplicate of [Compare strings in java and remove the part of string where they are identical](http://stackoverflow.com/questions/12620178/compare-strings-in-java-and-remove-the-part-of-string-where-they-are-identical) – aUserHimself Mar 29 '17 at 11:50
  • 1
    I dont understand the question. What is x4, what is sb? What is y1? I have no clue what the **string** values are that you intend to compare. So @Tobb please enlighten me. – GhostCat Mar 29 '17 at 11:51
  • 1
    @GhostCat They are part of the two strings that are being compared. – Tobb Mar 29 '17 at 11:52
  • 2
    @GhostCat `x4` is the first part of the String that is different, then `.printString("Bianca.()").` is the same so it will be skipped, then `y1` is different again, and `();` is the same again and skipped - for first `String`. Then `sb` is different, and `length` for the second `String`. – aUserHimself Mar 29 '17 at 11:53
  • Upvote. Interesting question then. – GhostCat Mar 29 '17 at 11:55
  • what is the result of `hello` and `wordoo` you want to equal ? – Youcef LAIDANI Mar 29 '17 at 11:55
  • Find the longest common subsequence between two strings. Once you have that you can find the different letters easily. – rohitanand Mar 29 '17 at 11:57
  • Perhaps [Apache Commons Text](https://commons.apache.org/proper/commons-text/) library helps you out. I've used it to tell the difference between two strings, so maybe there's something you can use. – joninx Mar 29 '17 at 12:01
  • Your expected output of "length" makes things difficult. The simple approach would be comparing character to character, but then the output for the end of the line would be `y1();` and `length();`. So one needs to take into account that the differing portions might be of different length.. – Tobb Mar 29 '17 at 12:03
  • Check this [link](http://stackoverflow.com/a/18345060/828551) out. – joninx Mar 29 '17 at 12:06
  • Possible duplicate of [Extract the difference between two strings in Java](http://stackoverflow.com/questions/18344721/extract-the-difference-between-two-strings-in-java) – Tom Mar 29 '17 at 12:07
  • @delca85 The answer for your first version of the implementation is actually correct, as the standard `LCS` will search even for discontinuous matches. You have to modify a little bit the algorithm to search only for contiguous substrings. – aUserHimself Mar 29 '17 at 14:47
  • @delca85 Tip: increase the current sequence size only if the current path is diagonal (which would mean elements were equal one by one contiguously), else reset value to 0. – aUserHimself Mar 29 '17 at 14:57
  • @aUserHimself: You are really helping me a lot. If I increase the current sequence only if `i == j` and if `s[i-1] == s[j-1]`, else, always in the case of `s[i-1] == s[j-1]` I obtain only the longest common subsequences, not all the common ones. – delca85 Mar 29 '17 at 15:13
  • @delca85 If you still have problems with this, check my updated answer: I have added my full implementation. – aUserHimself Mar 30 '17 at 07:29
  • @aUserHimself I have solved with the link with python code you posted me, but thank you very much anyway. – delca85 Mar 30 '17 at 07:48

4 Answers4

1

Find the longest common subsequence between two strings. After that you can use indexOf to get index of this common string in between both strings and fetch uncommon values from both.

example :

CICROSOFK
WOCROSFGT

Common letter is

CROS

Find different string from 0 to index of SOFT and from index+'SOFT'.length to str.length

rohitanand
  • 710
  • 1
  • 4
  • 26
  • Maybe you are right: I could find the longest common subsequence, then remove it from both the strings. Iterating this process, I could obtain only the portion where the strings differ. – delca85 Mar 29 '17 at 12:04
1

I already flagged a duplicate question above whose answer uses Longest Common Subsequence for 2 Strings.

So you can apply it recursively and on each new recursion, use a placeholder where this LCS was found so that you can mark the parts that differ. In the end, when no more common sequences exist, you will have to split each String by the placeholder and get the required parts.

UPDATE 1: If I think better now, this recursion part might not lead to an optimal solution (from the total execution time point of view), since you will iterate over the Strings multiple times. But there might be a way to retrieve all sequences from one iteration by reusing (a reduced version of) the memoization table, check this implementation and this more detailed one.

UPDATE 2: I have managed to implement the recursive version (not optimal), based on this code:

public class LongestCommonSequence {

    private final char[] firstStr;
    private final char[] secondStr;
    private int[][] LCS;
    private String[][] solution;
    private int max = -1, maxI = -1, maxJ = -1;
    private static final Character SEPARATOR = '|';

    public LongestCommonSequence(char[] firstStr, char[] secondStr) {
        this.firstStr = firstStr;
        this.secondStr = secondStr;
        LCS = new int[firstStr.length + 1][secondStr.length + 1];
        solution = new String[firstStr.length + 1][secondStr.length + 1];
    }

    public String find() {

        for (int i = 0; i <= secondStr.length; i++) {
            LCS[0][i] = 0;
            if(i > 0) {
                solution[0][i] = "   " + secondStr[i - 1];
            }
        }

        for (int i = 0; i <= firstStr.length; i++) {
            LCS[i][0] = 0;
            if(i > 0) {
                solution[i][0] = "   " + firstStr[i - 1];
            }
        }

        solution[0][0] = "NONE";

        for (int i = 1; i <= firstStr.length; i++) {
            for (int j = 1; j <= secondStr.length; j++) {
                if (firstStr[i - 1] == secondStr[j - 1] && firstStr[i - 1] != SEPARATOR) {
                    LCS[i][j] = LCS[i - 1][j - 1] + 1;
                    solution[i][j] = "diag";
                } else {
                    LCS[i][j] = 0;
                    solution[i][j] = "none";
                }
                if(LCS[i][j] > max) {
                    max = LCS[i][j];
                    maxI = i;
                    maxJ = j;
                }
            }
        }

        System.out.println("Path values:");
        for (int i = 0; i <= firstStr.length; i++) {
            for (int j = 0; j <= secondStr.length; j++) {
                System.out.print(" " + LCS[i][j]);
            }
            System.out.println();
        }

        System.out.println();
        System.out.println("Path recovery:");
        for (int i = 0; i <= firstStr.length; i++) {
            for (int j = 0; j <= secondStr.length; j++) {
                System.out.print(" " + solution[i][j]);
            }
            System.out.println();
        }
        System.out.println();
        System.out.println("max:" + max + " maxI:" + maxI + " maxJ:" + maxJ);

        return printSolution(maxI, maxJ);
    }

    public String printSolution(int i, int j) {
        String answer = "";
        while(i - 1 >= 0 && j - 1 >= 0 && LCS[i][j] != 0) {
            answer = firstStr[i - 1] + answer;
            i--;
            j--;
        }
        System.out.println("Current max solution: " + answer);
        return answer;
    }

    public static void main(String[] args) {
        String firstStr = "x4.printString(\\\"Bianca.()\\\").y1();";
        String secondStr = "sb.printString(\\\"Bianca.()\\\").length();";
        String maxSubstr;
        LongestCommonSequence lcs;
        do {
            lcs = new LongestCommonSequence(firstStr.toCharArray(), secondStr.toCharArray());
            maxSubstr = lcs.find();
            if(maxSubstr.length() != 0) {
                firstStr = firstStr.replace(maxSubstr, "" + LongestCommonSequence.SEPARATOR);
                secondStr = secondStr.replace(maxSubstr, "" + LongestCommonSequence.SEPARATOR);
            }
        }
        while(maxSubstr.length() != 0);

        System.out.println();
        System.out.println("first:" + firstStr + " second: " + secondStr);

        System.out.println("First array: ");
        String[] firstArray = firstStr.split("\\" + SEPARATOR);
        String[] secondArray = secondStr.split("\\" + SEPARATOR);
        for(String s: firstArray) {
            System.out.println(s);
        }
        System.out.println();
        System.out.println("Second array: ");
        for(String s: secondArray) {
            System.out.println(s);
        }
    }
}
Community
  • 1
  • 1
aUserHimself
  • 1,589
  • 2
  • 17
  • 26
0

My code might not be the most compact but I've written it like that for clarity :

public static void main(String[] args) throws InterruptedException, FileNotFoundException, ExecutionException {

    String s1 = "x4.printString(\"Bianca.()\").y1();";
    String s2 = "sb.printString(\"Bianca.()\").length();";

    List<String> result = new ArrayList<>();
    result.addAll(getDifferences(s1, s2));
    result.addAll(getDifferences(s2, s1));

    System.out.println(result);
}

public static List<String> getDifferences(String s1, String s2){
    if(s1 == null){
        return Collections.singletonList(s2);
    }
    if(s2 == null){
        return Collections.singletonList(s1);
    }
    int minimalLength = Math.min(s1.length(),s2.length());
    List<String> result = new ArrayList<>();
    StringBuilder buffer = new StringBuilder(); // keep the consecutive differences
    for(int i = 0; i<minimalLength; i++ ){
        char c = s1.charAt(i);
        if(c == s2.charAt(i)){
            if( buffer.length() > 0){
                result.add(buffer.toString());
                buffer = new StringBuilder();
            }
        } else {
            buffer.append(c);
        }
    }
    if(s1.length() > minimalLength){
        buffer.append(s1.substring(minimalLength)); // add the rest
    }
    if(buffer.length() > 0){
        result.add(buffer.toString()); //flush buffer
    }
    return result;
}

However, note that also returns the non-words characters as you didn't specified you wanted to remove them (but they don't figure in your expected output).

Jeremy Grand
  • 2,300
  • 12
  • 18
  • Ah I might have misunderstood. you consider the "();" to still be equals without regard to their index. That might change the implementation. – Jeremy Grand Mar 29 '17 at 12:26
0

This is the solution I found, thanks to this link posted by @aUserHimself.

private static int[][] lcs(String s, String t) {
        int m, n;
        m = s.length();
        n = t.length();
        int[][] table = new int[m+1][n+1];
        for (int i=0; i < m+1; i++)
            for (int j=0; j<n+1; j++)
                table[i][j] = 0;
        for (int i = 1; i < m+1; i++)
            for (int j = 1; j < n+1; j++)
                if (s.charAt(i-1) == t.charAt(j-1))
                        table[i][j] = table[i-1][j-1] + 1;
                else
                    table[i][j] = Math.max(table[i][j-1], table[i-1][j]);
        return table;
    }

private static List<List<String>> getDiffs(int[][] table, String s, String t, int i, int j,
                                           int indexS, int indexT, List<List<String>> diffs){
    List<String> sList, tList;
    sList = diffs.get(0);
    tList = diffs.get(1);
    if (i > 0 && j > 0 && (s.charAt(i-1) == t.charAt(j-1)))
        return getDiffs(table, s, t, i-1, j-1, indexS, indexT, diffs);
    else if (i > 0 || j > 0) {
            if (i > 0 && (j == 0 || table[i][j-1] < table[i-1][j])){
                if (i == indexS)
                    sList.set(sList.size()-1, String.valueOf(s.charAt(i-1)) + sList.get(sList.size() - 1));
                else
                    sList.add(String.valueOf(s.charAt(i-1)));
                diffs.set(0, sList);
                return getDiffs(table, s, t, i-1, j, i-1, indexT, diffs);
            }
            else if (j > 0 && (i == 0 || table[i][j-1] >= table[i-1][j])){
                if (j == indexT)
                    tList.set(tList.size() - 1, String.valueOf(t.charAt(j-1)) + tList.get(tList.size()-1));
                else
                    tList.add(String.valueOf(t.charAt(j-1)));
                diffs.set(1, tList);
                return getDiffs(table, s, t, i, j-1, indexS, j-1, diffs);
            }
        }
    return diffs;
}

private static List<List<String>> getAllDiffs(String s, String t){
    List<List<String>> diffs = new ArrayList<List<String>>();
    List<String> l1, l2;
    l1 = new ArrayList<>();
    l2 = new ArrayList<>();
    diffs.add(l1);
    diffs.add(l2);
    return getDiffs(lcs(s, t), s, t, s.length(), t.length(), 0,  0, diffs);
}

I posted because maybe it could be interesting for someone.

delca85
  • 1,176
  • 2
  • 10
  • 17