0

I'm new to java.

Can anybody tell me that is the easiest way to compare two string except one character?

like:

'test' 'text'  //only one character different

should return true

==============================

like input:

'test' 'txxt' //two character different return false

should return false

I know we can compare with a for loop. Is there any other way to do that? Thx for your help. : )

Tyler.z.yang
  • 2,402
  • 1
  • 18
  • 31
  • How should we behave if the strings are different lengths? (test vs. testy) – Jeff Ferland Nov 27 '14 at 16:03
  • 2
    If they have same length then simply loop through each character and stop when you find 2nd difference. If length can be different then it may be little bit more tricky. – Adriano Repetti Nov 27 '14 at 16:04
  • I only know to do this by using [Levenshtein distance](http://en.wikipedia.org/wiki/Levenshtein_distance) but may be kind overkill for what you're looking for. – Luiggi Mendoza Nov 27 '14 at 16:04
  • @JeffFerland the length of this string will be the same. – Tyler.z.yang Nov 28 '14 at 01:02
  • Hi @AdrianoRepetti Thx for your answer. Like I said, I know we can compare with a for loop. What I want to know is, is there any other easier way to do that? – Tyler.z.yang Nov 28 '14 at 01:03
  • @Tyler.z.yang without a loop? No, as for _normal_ string comparison each character obviously must be evaluated. One last point: this code is (kind of) fast but it's **WRONG** because it assumes each string element is a character. Java's strings are made of UTF-16 code points and UTF-16 is not fixed length encoding. You're Chinese and most of your characters will be encoded in 2 bytes (1 code point, 1 Java's Char) but it's not always true (for example with Traditional Chinese characters or some older ideograms). You _may_ (!!!) not worry about this but you should be aware. – Adriano Repetti Nov 28 '14 at 08:19
  • Suggested reading: http://stackoverflow.com/a/27229590/1207195. You may also enjoy to read about [Hamming distance](http://en.wikipedia.org/wiki/Hamming_distance) (in your case it may be optimized). – Adriano Repetti Dec 02 '14 at 17:01

2 Answers2

4

Assuming the Strings are the same size, here is a solution. This solution will need to be altered slightly for uneven String lengths

boolean compareStrings(String str1, String str2) {
    if (str1.length() != str2.length())
        return false;
    int differences = 0;
    for (int i = 0; i < str1.length(); i++) {
        if(str1.charAt(i) != str2.charAt(i))
            if(++differences > 1)
                return false;
    }
    //if the execution is here, then there are 0, or 1 differences, so return true
    return true;
}
David.Jones
  • 1,413
  • 8
  • 16
  • Thx for your answer. But I want to know whether there is a way to avoid something like for loop? – Tyler.z.yang Nov 28 '14 at 01:05
  • In order to solve the problem you specified you HAVE to iterate through all of the characters, unless the differences between the two string is greater than one character. The QUICKEST this problem can be solved is O(str.length). The answer I have provided does this. – David.Jones Nov 28 '14 at 01:13
  • Here is my question: "I know we can compare with a for loop. Is there any other way to do that? Thx for your help. : )" – Tyler.z.yang Nov 28 '14 at 01:16
  • The quick answer is no, you can't. You have to look at every character in the string in order to compare the two. If you use a for loop, while loop, or use an iterator...etc, it doesn't matter. What matters is the execution and memory efficiency. – David.Jones Nov 28 '14 at 01:20
  • 1
    This is absolutely a viable solution and many people will live happy with this (but code should be changed: if lengths differ by 1 then they may be considered equal). That said future readers must be aware this isn't UNICODE aware. It may not be an issue (if strings contain only ASCII characters) but every programmer working with strings should be aware of other issues. – Adriano Repetti Nov 28 '14 at 08:42
  • 1
    Let me quickly summarize what you should be aware of: 1) in UNICODE there are duplicates, for example **à** may be a single `Character` or made by two surrogates (**a** and trailing low surrogate for accent). If you compare `Character` by `Character` then you won't handle this. – Adriano Repetti Nov 28 '14 at 08:48
  • 1
    2) Java's `String` is UTF-16 encoded, each code point is 2 bytes but UTF-16 is not a fixed length encoding then one _character_ may be encoded as two `Character`s (type name is absolutely misleading, they're not characters but _code points_). It's not an issue for most western languages but you'll see the difference with far-east languages. – Adriano Repetti Nov 28 '14 at 08:50
  • 1
    3) Not in every culture shares same _definition_ of character. For example "ch" in Czech is _logically_ counted as one single character, historically is a digraph but it's often written (because of computer keyboards) as two separate characters. Even more obvious: think about Korean syllables (two or more letters are grouped into a single character - a syllable - with a specific code point in UNICODE). Here character by character comparison won't work (even with a fixed length encoding like UTF-32). – Adriano Repetti Nov 28 '14 at 08:53
  • 1
    4) This may not be an issue but if comparison should be case insensitive then more issues will arise: more lower case characters may equal to a single upper case character and not all UNICODE implementations works well (two examples: think about "ß" and "ss" in German). – Adriano Repetti Nov 28 '14 at 08:58
  • 1
    Most of code running out there won't work? No, it doesn't. It works pretty well 99% but there is that 1%. For a little bit more structured discussion: [this post](http://stackoverflow.com/a/23370462/1207195), [this post](http://stackoverflow.com/a/17941178/1207195), [this post](http://stackoverflow.com/a/26118918/1207195) and [this article](http://en.wikipedia.org/wiki/Duplicate_characters_in_Unicode). Many posts on SO about UNICODE surrogate handling... – Adriano Repetti Nov 28 '14 at 09:02
0

Try this method. It should work for every string combination but, depending on usage, maybe a performance tuning is needed.

public static boolean compare(String s1, String s2) {
    if((s1 != null && s2==null) || (s1 == null && s2!=null)){
        //exact one is null
        return false;
    }
    if((s1 == null && s2==null) ||  s1.equals(s2)){
        //both are null or equal
        return true;
    }
    if(Math.abs(s1.length() - s2.length()) > 1){
        //A different length of more than one is more than one difference, right ?
        return false;
    }
    //Here you have two different strings. Maybe one is a character larger than the other.
    if(s1.length() != s2.length()) {
        //They differ in length, so they must be equal in the first minLen charcaters. 
        int minLen = Math.min(s1.length(), s2.length());
        return s1.substring(0,minLen).equals(s2.substring(0,minLen));
    }

    //Here you have two different strings of the same length.
    int diff = 0;
    for(int i = 0; i < s1.length() && diff < 2; i++){
        if(s1.charAt(i) != s2.charAt(i)){
            diff++;
        }
    }
    return diff < 2;
}
Gren
  • 1,850
  • 1
  • 11
  • 16