0

I'm taking an introductory course to Java and one of my latest projects involve making sure an array doesn't contain any duplicate elements (has distinct elements). I used a for loop with an inner for loop, and it works, but I've heard that you should try to avoid using many iterations in a program (and other methods in my classes have a fair number of iterations as well). Is there any efficient alternative to this code? I'm not asking for code of course, just "concepts." Would there potentially be a recursive way to do this? Thanks!

The array sizes are generally <= 10.

/** Iterates through a String array ARRAY to see if each element in ARRAY is
 *  distinct. Returns false if ARRAY contains duplicates. */
boolean distinctElements(String[] array) { //Efficient?
    for (int i = 0; i < array.length; i += 1) {
        for (int j = i + 1; j < array.length; j += 1) {
            if (array[i] == array[j]) {
                return false;
            }
        }
    } return true;
}
lewis.k
  • 23
  • 1
  • 3
  • 1
    Do you know how to describe the efficiency of these loops? i.e. what do you think the big-O time complexity is here? – Andy Turner Oct 19 '15 at 21:34
  • 4
    You should not compare `Strings` with `==` by the way. To answer your question, an alternative would be to use a Set and check that its size is the same as the array length after you inserted all the array's elements into it (or check on the fly). By efficienty you change the complexity of the algorithm from O(n^2) to O(n) but you use (at most) O(n) extra space. – Alexis C. Oct 19 '15 at 21:35
  • Check [How do I compare strings in Java?](http://stackoverflow.com/questions/513832/how-do-i-compare-strings-in-java) – sam Oct 19 '15 at 21:37
  • Yeah, I knew how to describe the runtime of the method. But I've never used a Set before. So, Sets by default ignore duplicate items? Also, why shouldn't I compare Strings with ==? Thanks! – lewis.k Oct 19 '15 at 21:47
  • 1
    @AlexisC. here's an important piece of information: *The array sizes are generally <= 10*. In the worst case, this will be O(100) which is O(1) in the end, so using the `for` loop maybe is the best alternative for such small arrays. If the array is really big, then it would be better to use a `Set`. It's better to test and measure the times rather than going all theoretical. – Luiggi Mendoza Oct 19 '15 at 21:48
  • @LuiggiMendoza I know, the OP asked for concepts though; so I give him what is the complexity time - even if the size of the array is small. If you have maximum 10 000 elements you don't say it will be O(100000000) in the worst case so it's O(1). With this reasoning, everything is O(1). In my opinion it's important to know what's the complexity of your algorithm is, even if you will always have small inputs. – Alexis C. Oct 19 '15 at 21:56
  • @AlexisC. I like Andy's answer, where all concepts are used rather than plain Big O analysis to theoretically measure the speed of the algorithm. – Luiggi Mendoza Oct 19 '15 at 21:58
  • @LuiggiMendoza We all know the _"measure first then optimize"_ idiom. Nevertheless, in my opinion, it's important to know how to do time complexity analysis on your algorithm (and this is only what I've done here). When it comes into practical implementation, other factors are taken into considerations of course - but that was not the point of my comment. Of course you can let this implementation for small inputs but you have to know that when it grows it can run quadratically slowly. – Alexis C. Oct 19 '15 at 22:11

2 Answers2

2

"Efficiency" is almost always a trade-off. Occasionally, there are algorithms that are simply better than others, but often they are only better in certain circumstances.

For example, this code above: it's got time complexity O(n^2).

One improvement might be to sort the strings: you can then compare the strings by comparing if an element is equal to its neighbours. The time complexity here is reduced to O(n log n), because of the sorting, which dominates the linear comparison of elements.

However - what if you don't want to change the elements of the array - for instance, some other bit of your code relies on them being in their original order - now you also have to copy the array and then sort it, and then look for duplicates. This doesn't increase the overall time or storage complexity, but it does increase the overall time and storage, since more work is being done and more memory is required.

Big-oh notation only gives you a bound on the time ignoring multiplicative factors. Maybe you only have access to a really slow sorting algorithm: actually, it turns out to be faster just to use your O(n^2) loops, because then you don't have to invoke the very slow sort.

This could be the case when you have very small inputs. An oft-cited example of an algorithm that has poor time complexity but actually is useful in practice is Bubble Sort: it's O(n^2) in the worst case, but if you have a small and/or nearly-sorted array, it can actually be pretty darn fast, and pretty darn simple to implement - never forget the inefficiency of you having to write and debug the code, and to have to ask questions on SO when it doesn't work as you expect.

What if you know that the elements are already sorted, because you know something about their source. Now you can simply iterate through the array, comparing neighbours, and the time complexity is now O(n). I can't remember where I read it, but I once saw a blog post saying (I paraphrase):

A given computer can never be made to go quicker; it can only ever do less work.

If you can exploit some property to do less work, that improves your efficiency.

So, efficiency is a subjective criterion:

  • Whenever you ask "is this efficient", you have to be able to answer the question: "efficient with respect to what?". It might be space; it might be time; it might be how long it takes you to write the code.
  • You have to know the constraints of the hardware that you're going to run it on - memory, disk, network requirements etc may influence your choices.
  • You need to know the requirements of the user on whose behalf you are running it. One user might want the results as soon as possible; another user might want the results tomorrow. There is never a need to find a solution better than "good enough" (although that can be a moving goal once the user sees what is possible).
  • You also have to know what inputs you want it to be efficient for, and what properties of that input you can exploit to avoid unnecessary work.
Andy Turner
  • 137,514
  • 11
  • 162
  • 243
1

First, array[i] == array[j] tests reference equality. That's not how you test String(s) for value equality. I would add each element to a Set. If any element isn't successfully added (because it's a duplicate), Set.add(E) returns false. Something like,

static boolean distinctElements(String[] array) {
    Set<String> set = new HashSet<>();
    for (String str : array) {
        if (!set.add(str)) {
            return false;
        }
    }
    return true;
}

You could render the above without a short-circuit like

static boolean distinctElements(String[] array) {
    Set<String> set = new HashSet<>(Arrays.asList(array));
    return set.size() == array.length;
}
Elliott Frisch
  • 198,278
  • 20
  • 158
  • 249
  • Why doesn't == test Strings for equality? I figured if String a = 'a', String b = 'b', a =/= b since they're different and strings are literals... or am I terribly misinformed here? – lewis.k Oct 19 '15 at 21:49
  • @lewis.k try this: `String a = "a"; String ab = a + "b"; System.out.println(ab == "ab"); System.out.println(ab.equals("ab"));` – Luiggi Mendoza Oct 19 '15 at 21:51