31

Suppose I have two long strings. They are almost same.

String a = "this is a example"
String b = "this is a examp"

Above code is just for example. Actual strings are quite long.

Problem is one string have 2 more characters than the other.

How can I check which are those two character?

Baz
  • 36,440
  • 11
  • 68
  • 94
user17526
  • 327
  • 1
  • 4
  • 4

13 Answers13

38

You can use StringUtils.difference(String first, String second).

This is how they implemented it:

public static String difference(String str1, String str2) {
    if (str1 == null) {
        return str2;
    }
    if (str2 == null) {
        return str1;
    }
    int at = indexOfDifference(str1, str2);
    if (at == INDEX_NOT_FOUND) {
        return EMPTY;
    }
    return str2.substring(at);
}

public static int indexOfDifference(CharSequence cs1, CharSequence cs2) {
    if (cs1 == cs2) {
        return INDEX_NOT_FOUND;
    }
    if (cs1 == null || cs2 == null) {
        return 0;
    }
    int i;
    for (i = 0; i < cs1.length() && i < cs2.length(); ++i) {
        if (cs1.charAt(i) != cs2.charAt(i)) {
            break;
        }
    }
    if (i < cs2.length() || i < cs1.length()) {
        return i;
    }
    return INDEX_NOT_FOUND;
}
EmmanuelMess
  • 1,025
  • 2
  • 17
  • 31
JRL
  • 76,767
  • 18
  • 98
  • 146
  • 14
    As far as I see, this does not return characters that are different, but just the whole string from the point where the strings don't match anymore... – brimborium Aug 23 '12 at 11:00
  • 2
    @brimborium so maybe you should clarify your question. I think this answer is really appropriate to your original question. – alcor Aug 23 '12 at 15:05
  • 1
    @alcor I am not the OP. ;) And I disagree. The original title was a bit misleading, but the question itself always stated clearly that he wants to extract the odd characters... – brimborium Aug 23 '12 at 15:10
22

To find the difference between 2 Strings you can use the StringUtils class and the difference method. It compares the two Strings, and returns the portion where they differ.

 StringUtils.difference(null, null) = null
 StringUtils.difference("", "") = ""
 StringUtils.difference("", "abc") = "abc"
 StringUtils.difference("abc", "") = ""
 StringUtils.difference("abc", "abc") = ""
 StringUtils.difference("ab", "abxyz") = "xyz"
 StringUtils.difference("abcde", "abxyz") = "xyz"
 StringUtils.difference("abcde", "xyz") = "xyz"
Jasperan
  • 2,154
  • 1
  • 16
  • 40
ccu
  • 462
  • 6
  • 8
14

Without iterating through the strings you can only know that they are different, not where - and that only if they are of different length. If you really need to know what the different characters are, you must step through both strings in tandem and compare characters at the corresponding places.

Kilian Foth
  • 13,904
  • 5
  • 39
  • 57
12

The following Java snippet efficiently computes a minimal set of characters that have to be removed from (or added to) the respective strings in order to make the strings equal. It's an example of dynamic programming.

import java.util.HashMap;
import java.util.Map;

public class StringUtils {

    /**
     * Examples
     */
    public static void main(String[] args) {
        System.out.println(diff("this is a example", "this is a examp")); // prints (le,)
        System.out.println(diff("Honda", "Hyundai")); // prints (o,yui)
        System.out.println(diff("Toyota", "Coyote")); // prints (Ta,Ce)
        System.out.println(diff("Flomax", "Volmax")); // prints (Fo,Vo)
    }

    /**
     * Returns a minimal set of characters that have to be removed from (or added to) the respective
     * strings to make the strings equal.
     */
    public static Pair<String> diff(String a, String b) {
        return diffHelper(a, b, new HashMap<>());
    }

    /**
     * Recursively compute a minimal set of characters while remembering already computed substrings.
     * Runs in O(n^2).
     */
    private static Pair<String> diffHelper(String a, String b, Map<Long, Pair<String>> lookup) {
        long key = ((long) a.length()) << 32 | b.length();
        if (!lookup.containsKey(key)) {
            Pair<String> value;
            if (a.isEmpty() || b.isEmpty()) {
                value = new Pair<>(a, b);
            } else if (a.charAt(0) == b.charAt(0)) {
                value = diffHelper(a.substring(1), b.substring(1), lookup);
            } else {
                Pair<String> aa = diffHelper(a.substring(1), b, lookup);
                Pair<String> bb = diffHelper(a, b.substring(1), lookup);
                if (aa.first.length() + aa.second.length() < bb.first.length() + bb.second.length()) {
                    value = new Pair<>(a.charAt(0) + aa.first, aa.second);
                } else {
                    value = new Pair<>(bb.first, b.charAt(0) + bb.second);
                }
            }
            lookup.put(key, value);
        }
        return lookup.get(key);
    }

    public static class Pair<T> {
        public Pair(T first, T second) {
            this.first = first;
            this.second = second;
        }

        public final T first, second;

        public String toString() {
            return "(" + first + "," + second + ")";
        }
    }
}
jjoller
  • 661
  • 9
  • 17
  • this solves my problem of spellchecking perfectly! this is the most generic solution. – bersling Oct 24 '18 at 12:10
  • 1
    In case you need to know _where_ the characters that do not match are located: this code can easily be modified to return whatever kind of result you like instead of a pair of string -- just change the return result, the IDE will highlight what you have to do next. And here is some theory about the method: https://dzone.com/articles/the-levenshtein-algorithm-1 – 18446744073709551615 Mar 27 '20 at 07:23
10

To directly get only the changed section, and not just the end, you can use Google's Diff Match Patch.

List<Diff> diffs = new DiffMatchPatch().diffMain("stringend", "stringdiffend");
for (Diff diff : diffs) {
  if (diff.operation == Operation.INSERT) {
    return diff.text; // Return only single diff, can also find multiple based on use case
  }
}

For Android, add: implementation 'org.bitbucket.cowwoc:diff-match-patch:1.2'

This package is far more powerful than just this feature, it is mainly used for creating diff related tools.

Gibolt
  • 42,564
  • 15
  • 187
  • 127
2
String strDiffChop(String s1, String s2) {
    if (s1.length > s2.length) {
        return s1.substring(s2.length - 1);
    } else if (s2.length > s1.length) {
        return s2.substring(s1.length - 1);
    } else {
        return null;
    }
}
GlenPeterson
  • 4,866
  • 5
  • 41
  • 49
  • What if the diff is not after the string and is in the middle? – brunoais Aug 24 '13 at 08:25
  • @brunoais If you need to find a mid-string difference, see JRL's answer. This was intended to be a simpler answer based on a different interpretation of the question. – GlenPeterson Mar 13 '15 at 21:03
2

Google's Diff Match Patch is good, but it was a pain to install into my Java maven project. Just adding a maven dependency did not work; eclipse just created the directory and added the lastUpdated info files. Finally, on the third try, I added the following to my pom:

<dependency>
    <groupId>fun.mike</groupId>
     <artifactId>diff-match-patch</artifactId>
    <version>0.0.2</version>
</dependency>

Then I manually placed the jar and source jar files into my .m2 repo from https://search.maven.org/search?q=g:fun.mike%20AND%20a:diff-match-patch%20AND%20v:0.0.2

After all that, the following code worked:

import fun.mike.dmp.Diff;
import fun.mike.dmp.DiffMatchPatch;

DiffMatchPatch dmp = new DiffMatchPatch();
LinkedList<Diff> diffs = dmp.diff_main("Hello World.", "Goodbye World.");
System.out.println(diffs);

The result:

[Diff(DELETE,"Hell"), Diff(INSERT,"G"), Diff(EQUAL,"o"), Diff(INSERT,"odbye"), Diff(EQUAL," World.")]

Obviously, this was not originally written (or even ported fully) into Java. (diff_main? I can feel the C burning into my eyes :-) ) Still, it works. And for people working with long and complex strings, it can be a valuable tool.

Tihamer
  • 935
  • 10
  • 8
1

To find the words that are different in the two lines, one can use the following code.

    String[] strList1 = str1.split(" ");
    String[] strList2 = str2.split(" ");

    List<String> list1 = Arrays.asList(strList1);
    List<String> list2 = Arrays.asList(strList2);

    // Prepare a union
    List<String> union = new ArrayList<>(list1);
    union.addAll(list2);

    // Prepare an intersection
    List<String> intersection = new ArrayList<>(list1);
    intersection.retainAll(list2);

    // Subtract the intersection from the union
    union.removeAll(intersection);

    for (String s : union) {
        System.out.println(s);
    }

In the end, you will have a list of words that are different in both the lists. One can modify it easily to simply have the different words in the first list or the second list and not simultaneously. This can be done by removing the intersection from only from list1 or list2 instead of the union.

Computing the exact location can be done by adding up the lengths of each word in the split list (along with the splitting regex) or by simply doing String.indexOf("subStr").

stolen_leaves
  • 1,441
  • 11
  • 19
  • 1
    So "this is a test" and "a test this is" are equal? – Tim Sep 27 '17 at 18:54
  • Yes. However the op seems to be interested only in the difference of charecter rather than the order. Once you have the different words, they can be compared in a similar ways at the charecter Level to find the extra charecters... – stolen_leaves Feb 22 '18 at 11:32
  • Google's Diff Match Patch is good, but it was a pain to use in my maven project. Just adding a dependency did not work. Finally, to my pom I added the following: – Tihamer Oct 16 '19 at 16:40
1

On top of using StringUtils.difference(String first, String second) as seen in other answers, you can also use StringUtils.indexOfDifference(String first, String second) to get the index of where the strings start to differ. Ex:

StringUtils.indexOfDifference("abc", "dabc") = 0
StringUtils.indexOfDifference("abc", "abcd") = 3

where 0 is used as the starting index.

Jasperan
  • 2,154
  • 1
  • 16
  • 40
0

Another great library for discovering the difference between strings is DiffUtils at https://github.com/java-diff-utils. I used Dmitry Naumenko's fork:

public void testDiffChange() {
    final List<String> changeTestFrom = Arrays.asList("aaa", "bbb", "ccc");
    final List<String> changeTestTo = Arrays.asList("aaa", "zzz", "ccc");
    System.out.println("changeTestFrom=" + changeTestFrom);
    System.out.println("changeTestTo=" + changeTestTo);
    final Patch<String> patch0 = DiffUtils.diff(changeTestFrom, changeTestTo);
    System.out.println("patch=" + Arrays.toString(patch0.getDeltas().toArray()));

    String original = "abcdefghijk";
    String badCopy =  "abmdefghink";
    List<Character> originalList = original
            .chars() // Convert to an IntStream
            .mapToObj(i -> (char) i) // Convert int to char, which gets boxed to Character
            .collect(Collectors.toList()); // Collect in a List<Character>
    List<Character> badCopyList = badCopy.chars().mapToObj(i -> (char) i).collect(Collectors.toList());
    System.out.println("original=" + original);
    System.out.println("badCopy=" + badCopy);
    final Patch<Character> patch = DiffUtils.diff(originalList, badCopyList);
    System.out.println("patch=" + Arrays.toString(patch.getDeltas().toArray()));
}

The results show exactly what changed where (zero based counting):

changeTestFrom=[aaa, bbb, ccc]
changeTestTo=[aaa, zzz, ccc]
patch=[[ChangeDelta, position: 1, lines: [bbb] to [zzz]]]
original=abcdefghijk
badCopy=abmdefghink
patch=[[ChangeDelta, position: 2, lines: [c] to [m]], [ChangeDelta, position: 9, lines: [j] to [n]]]
Tihamer
  • 935
  • 10
  • 8
0

For a simple use case like this. You can check the sizes of the string and use the split function. For your example

a.split(b)[1]
Prakash
  • 7,794
  • 4
  • 48
  • 44
0

I think the Levenshtein algorithm and the 3rd party libraries brought out for this very simple (and perhaps poorly stated?) test case are WAY overblown.

Assuming your example does not suggest the two bytes are always different at the end, I'd suggest the JDK's Arrays.mismatch( byte[], byte[] ) to find the first index where the two bytes differ.

    String longer  = "this is a example";
    String shorter = "this is a examp";
    int differencePoint = Arrays.mismatch( longer.toCharArray(), shorter.toCharArray() );
    System.out.println( differencePoint );

You could now repeat the process if you suspect the second character is further along in the String.

Or, if as you suggest in your example the two characters are together, there is nothing further to do. Your answer then would be:

    System.out.println( longer.charAt( differencePoint ) );
    System.out.println( longer.charAt( differencePoint + 1 ) );

If your string contains characters outside of the Basic Multilingual Plane - for example emoji - then you have to use a different technique. For example,

    String a = "a  is cuter than a .";
    String b = "a  is cuter than a .";
    int firstDifferentChar      = Arrays.mismatch( a.toCharArray(), b.toCharArray() );
    int firstDifferentCodepoint = Arrays.mismatch( a.codePoints().toArray(), b.codePoints().toArray() );
    System.out.println( firstDifferentChar );       // prints 22!
    System.out.println( firstDifferentCodepoint );  // prints 20, which is correct.
    System.out.println( a.codePoints().toArray()[ firstDifferentCodepoint ] ); // prints out 128007
    System.out.println( new String( Character.toChars( 128007 ) ) ); // this prints the rabbit glyph.
Douglas Held
  • 1,452
  • 11
  • 25
-1

You may try this

String a = "this is a example";
String b = "this is a examp";

String ans= a.replace(b, "");

System.out.print(ans);      
//ans=le