1

The sample code I created returns how many characters are similar between both strings, but how can I get it so it does not return duplicate similar characters? Like lets say if the inputs were 1: "hello" and 2: "lend". When the code iterates it will see a total of 2 similar l's how can I get it to only return one "l" in my case?

static int compare(String input1, String input2){

            int count = 0;


        for(int i = 0; i < input1.length(); i++) 
        {
            if(input2.contains(String.valueOf(input1.charAt(i)))){
                count++;
            }
        }

        return count;
    }
Gintoki
  • 97
  • 1
  • 9
  • Use a `Set` to keep track of characters already seen, e.g. use a `HashSet`. – Andreas Apr 04 '17 at 04:02
  • Just to confirm, you're expecting to ignore case, right? The "L" and "l" actually shouldn't match at all because they are different cases. – Jeremy Maxwell Castillo Apr 04 '17 at 04:03
  • My mistake, the inputs wont have capitalization. – Gintoki Apr 04 '17 at 04:05
  • *Unrelated:* A better way to check if `input2` contains the character `input1.charAt(i)` is to use [`indexOf()`](https://docs.oracle.com/javase/8/docs/api/java/lang/String.html#indexOf-int-), i.e. `if (input2.indexOf(input1.charAt(i)) != -1)` – Andreas Apr 04 '17 at 04:06

6 Answers6

4

For best performance, if strings can be long, and you need to support all Unicode characters, use Set<Integer> and retainAll(), where the integer value is a Unicode code point.

In Java 8, that can be done with this code:

private static int countDistinctCommonChars(String s1, String s2) {
    Set<Integer> set1 = s1.codePoints().boxed().collect(Collectors.toSet());
    Set<Integer> set2 = s2.codePoints().boxed().collect(Collectors.toSet());
    set1.retainAll(set2);
    return set1.size();
}

If you instead want the common characters returned, you can do this:

private static String getDistinctCommonChars(String s1, String s2) {
    Set<Integer> set1 = s1.codePoints().boxed().collect(Collectors.toSet());
    Set<Integer> set2 = s2.codePoints().boxed().collect(Collectors.toSet());
    set1.retainAll(set2);
    int[] codePoints = set1.stream().mapToInt(Integer::intValue).toArray();
    Arrays.sort(codePoints);
    return new String(codePoints, 0, codePoints.length);
}

Test

public static void main(String[] args) {
    test("hello", "lend");
    test("lend", "hello");
    test("mississippi", "expressionless");
    test("expressionless", "comprehensible");
    test("", ""); // Extended, i.e. 2 chars per code point
}
private static void test(String s1, String s2) {
    System.out.printf("Found %d (\"%s\") common chars between \"%s\" and \"%s\"%n",
                      countDistinctCommonChars(s1, s2),
                      getDistinctCommonChars(s1, s2),
                      s1, s2);
}

Output

Found 2 ("el") common chars between "hello" and "lend"
Found 2 ("el") common chars between "lend" and "hello"
Found 3 ("ips") common chars between "mississippi" and "expressionless"
Found 8 ("eilnoprs") common chars between "expressionless" and "comprehensible"
Found 2 ("") common chars between "" and ""

Note that last test is using Unicode characters from the 'Domino Tiles' Unicode Block (U+1F030 to U+1F09F), i.e. characters that are stored in Java strings as surrogate pairs.

Community
  • 1
  • 1
Andreas
  • 154,647
  • 11
  • 152
  • 247
1

Another way to do this would be to create a copy of input2 and remove found characters. I believe this should be more efficient than using ArrayList.

static int compare(String input1, String input2){

        int count = 0;

        String check = input2;
        for(int i = 0; i < input1.length(); i++) 
        {
            if(check.contains(String.valueOf(input1.charAt(i)))){
                check = check.replace(String.valueOf(input1.charAt(i)), "");
                count++;
            }
        }
        return count;
}
  • 1
    you create a new string every time you detect a duplicate, that's not efficient at all. Of all the proposed methods my 2nd one will by far be the fastest. – maraca Apr 04 '17 at 04:22
0

I would suggest creating an ArrayList to keep track of characters already "caught" while you're iterating through the string.

You would have to initialize the ArrayList of characters before the for loop. If you find a duplicate character add it to the ArrayList. Change your if statement inside of the for loop in your current code to include if the ArrayList actually contains the letter. Therefore, it won't repeat and increment count.

Please remember characters are case-sensitive.

Raheel138
  • 147
  • 10
0

If you are dealing with all kinds of characters then it's probably easiest to use a set. If you are only dealing with a-z then you could do it like this:

int count = 0;
for (char c = 'a'; c <= 'z'; c++)
    if (input1.indexOf(c) >= 0 && input2.indexOf(c) >= 0)
        count++;
return count;
maraca
  • 8,468
  • 3
  • 23
  • 45
0

You can use this regex to remove any duplicate character from your strings.

(.)(?=.*\1)

Once all duplicates are removed, you can continue your normal execution.

static int compare(String input1, String input2){
    int count = 0;

    input1 = input1.replaceAll("(.)(?=.*\\1)", "");
    input2 = input2.replaceAll("(.)(?=.*\\1)", "");


    for(int i = 0; i < input1.length(); i++) 

    ...
Raman Sahasi
  • 30,180
  • 9
  • 58
  • 71
0
static int compare(String input1, String input2){

        int count = 0;
        String taken = "";

        for(int i = 0; i < input1.length(); i++) 
        {
            if(input2.contains(String.valueOf(input1.charAt(i)))){
                if(!taken.contains(String.valueOf(input1.charAt(i)))){
                    taken = taken.concat(String.valueOf(input1.charAt(i)));
                    count++;
                }
            }
        }

        return count;
    }
Sunil Prajapati
  • 330
  • 2
  • 6