2

I was supposed to write a code that will say if an array has duplicates. The runtime wasn't important. I think my below code will have O(n²) because I used a nested for-loop. I know there are much better and faster codes than the one I have written, but my question is if the break statement I made inside the if-statement will make my code (a bit) faster? It should make it faster because the program knows "hey, we found a duplicate and we can stop searching for more". I once heard from a fellow student that the code is better / more stable if I avoid statements like return or break. Too bad I didn't care enough back then to ask why. Maybe you can tell me if this is true?

And if he is right and these statements "hurt" my code, there are any better workarounds?

public class FindDuplicate{
    public static void main(String[] args){
        int[] A={1,2,3,4,5,6,7,8,4};
        boolean bool=false;
        for(int i=0; i<A.length; i++){
            for(int j=0; j<A.length; j++){
                if(A[i]==A[j] && i!=j){
                    bool=true;
                    break;
                }
            }
        }
        if(bool==true){
            System.out.print("Duplicate found");
        }else{
            System.out.print("No duplicate found");
        }
    }
}
cnmesr
  • 345
  • 3
  • 11

2 Answers2

3

my question is if the break statement I made inside the if-statement will make my code (a bit) faster?

not in every case, however, in most cases, it does make your code faster considering that you don't have to keep iterating when you've found what you're after.


The algorithm below contains two nested loops. The outer loop iterates over all the array’s N items, so it takes O(N) steps. For each trip through the outer loop, the inner loop also iterates over the N items in the array, so it also takes O(N) steps. Because one loop is nested inside the other, the combined performance is O(N × N) = O(N2).

for(int i = 0; i < A.length; i++){
    for(int j=0; j < A.length; j++){
       if(A[i] == A[j] && i != j){
           bool = true;
           break;
       }
    }
}

We can make your algorithm a bit faster by not going back to the j = 0 at each iteration of the outer loop.

for(int i = 0; i < A.length; i++){
  for(int j = i+1; j < A.length; j++){
     if(A[i] == A[j]){
         bool = true;
         break;
     }
  }
}

note in this case we don't need to check && i != j because they will never be equal.


I once heard from a fellow student that the code is better / more stable if I avoid statements like return or break

The JVM specification does not state either an existence or absence of a performance loss when using break. To put it simply, there is no proof whatsoever that using break or return makes your code unstable(not that I know of anyway). The only scenario in which I would say "oh this is not a good practise" is when you overuse the word break. However, in many cases break is the only possibility to accomplish a task faster, an example is your current solution. Basically, why carry on iterating when you've found what you was after?. I consider return as also not being "a bad practise" because similar to break, why carry on executing code when you don't need to, surely this makes your code faster.


Can we make the find duplicate algorithm faster?

Sure we can, consider the Set interface in java which doesn't allow duplicates and it's based upon hash table data structure so insertion take O(1) time in average case. By using HashSet, a general purpose Set implementation, we can find duplicates in O(n) time. Since HashSet allows only unique elements, the add() method will fail and return false when you try to add duplicates.

Solution:

public static boolean hasDuplicate(int[] array) {
      Set<Integer> dupes = new HashSet<Integer>();
      for (Integer i : array) {
          if (!dupes.add(i)) {
             return true; // we have found a duplicate
          }
      }
      return false; // no duplicate
}
Ousmane D.
  • 54,915
  • 8
  • 91
  • 126
1

Actually you don't need that bool flag variable nor use break. return will stop iterating, and if no duplicated is found just return false:

private static boolean findDuplicateOriginal(int[] A) {
    for(int i=0; i<A.length; i++){
        for(int j=0; j<A.length; j++){
            if(A[i]==A[j] && i!=j){
                return true;
            }
        }
    }
    return false;
}

Just point that performance should not be your only goal when coding. You should be as worried for maintability or writing less/clean code as worried by performance. It depends on the context (how often that function will be called, how many iterations should it do, will it run using paralelStream?...) your app runs for choosing one or another way of doing things.

There are many posts talking about loop performance vs stream performance and opinions for and against:

I just want to show you how clean (1 line!) is using Java8 syntax for the same purpouse:

import java.util.Arrays;

public class test {

public static void main(String[] args) {
    int[] A = {1,2,3,4,5,6,7,8,9};
    System.out.println(Arrays.toString(A) + " using findDuplicate >> " + findDuplicate(A));
    System.out.println(Arrays.toString(A) + " using findDuplicateOriginal >>" + findDuplicateOriginal(A));

    int[] B = {1,1,3,4,5,6,7,8,4};
    System.out.println(Arrays.toString(B) + " using findDuplicate >> " + findDuplicate(B));
    System.out.println(Arrays.toString(B) + " using findDuplicateOriginal >> " + findDuplicateOriginal(B));
}

// using streams
private static boolean findDuplicate(int[] items) {
    return !(Arrays.stream(items).distinct().count() == items.length);
}

// refactored original version
private static boolean findDuplicateOriginal(int[] A) {
    for(int i=0; i<A.length; i++){
        for(int j=0; j<A.length; j++){
            if(A[i]==A[j] && i!=j){
                return true;
            }
        }
    }
    return false;
}
}

Output:

[1, 2, 3, 4, 5, 6, 7, 8, 9] using findDuplicate >> false
[1, 2, 3, 4, 5, 6, 7, 8, 9] using findDuplicateOriginal >>false
[1, 1, 3, 4, 5, 6, 7, 8, 4] using findDuplicate >> true
[1, 1, 3, 4, 5, 6, 7, 8, 4] using findDuplicateOriginal >> true
Community
  • 1
  • 1
exoddus
  • 2,230
  • 17
  • 27