1

I am working on an assignment and i have everything working well, I have a list that I have randomly created, and now I have to call a method that removes any duplicated integers in the list. My list size is 100, and when I call my removeDuplicates method it removes most of the duplicates but does not remove all of them. I am wondering if someone can take a look at my method and see why it is doing this. Thanks. Here is my code:

    int size = A.size();
    for (int i = 0; i < size -1; i++) {
        for (int j = i + 1; j < size;) {
            if (A.get(i) == A.get(j)) {
                A.remove(j);
                size--;
            }
            else
                j++;
        }
    }

Here is my full class:

public class Utility {

public static void selectionSort(List<Integer> A) {
    int smIndex, sm;

    for (int i = 0; i < A.size(); i++) {
        sm = A.get(i);
        smIndex = i;

        for (int k = i; k < A.size(); k++) {
            if (sm > A.get(k)) {
                sm = A.get(k);
                smIndex = k;
            }
        }

        if (smIndex == i)
            ;
        else {
            int temp = A.get(i);
            A.set(i, A.get(smIndex));
            A.set(smIndex,  temp);
        }
    }
}

public static void removeDuplicates(List<Integer> A) {
    /*for (int i = 0; i < A.size() -1; i++) {
        for (int j = i + 1; j < A.size();) {
            if (A.get(i) == A.get(j)) {
                A.remove(j);
            }
            else
                j++;
        }
    }*/
        for (int i = 0; i < A.size(); i++) {
            for (int j = 0; j < A.size(); j++) {
                if (i == j) 
                    ;
                else if (A.get(i) == A.get(j)) {
                    A.remove(A.get(j));
                    j = j-1;
                }
                else
                    ;
            }
        }
}

}

This is the test program ( I am just hard coding the 100 for now until i get the remove duplicates figured out):

public class PA1Test extends StopWatch {

public static void main(String[] args) {

    int n = 100;
    List<Integer> list = new Vector<Integer>();

    for (int i = 0; i < n; i++) {
        list.add((int) (Math.random()*2*n));
    }

    start();
    selectionSort(list);
    stop();
    System.out.println("Execution Time: " + getElapsedTime() + " milliseconds");

    if (n > 100) {
        removeDuplicates(list);
        System.out.println(list);
    }
    else {
        System.out.println(list);
        removeDuplicates(list);
        System.out.println(list);
    }


}

}

Hooter
  • 11
  • 2
  • 2
    When you are removing `i`'th element, your `i+1`'th element becomes `i`'th and then you proceed to `i+1`'th element, which is a former `i+2`'th element. See where you miss it? You never check `i+1`'th. – sashkello Jan 23 '14 at 02:05
  • possible duplicate of [Java - Removing duplicates in an ArrayList](http://stackoverflow.com/questions/2435156/java-removing-duplicates-in-an-arraylist) (see the third answer for a solution with loops) – sashkello Jan 23 '14 at 02:09
  • Yeah, iterating over a list while editing it at the same time is tricky business. – Edward Falk Jan 23 '14 at 02:16
  • Also, A.remove(A.get(j)) is wrong unless there's some sort of weird indirection going on that I don't know about. – Edward Falk Jan 23 '14 at 02:21
  • From looking at the other post I have cleaned up the code a little bit but it still seems to stop removing the duplicate numbers around the 60th integer out of 100: – Hooter Jan 23 '14 at 02:40
  • The code looks good to me. Maybe it's removed 40 integers so far, and the length of the array is now only 60? Have you confirmed that there are still duplicates in the list? – Edward Falk Jan 23 '14 at 03:18
  • With the list being sorted i have 100 random integers between [0,200] in the list and when i remove the duplicated integers it does fine up until around 130 or so, then it quits and from what i have been counting it seems to be the 60th integer in list or somewhere around that. – Hooter Jan 23 '14 at 03:31
  • 1
    change the == to equal() operator – nightograph Jan 23 '14 at 04:43

3 Answers3

2

The problem you've observed is due to a bad == comparison. You mentioned in a comment that the comparison is good up until around 130 and that is correct. == works there because values -128 through 127 are cached so unless they are created explicitly like new Integer(56) they are reference equal. You can do something like the following:

for(int i = 0; i < A.size() - 1; i++) {
    for(int k = i + 1; k < A.size();) {
        if(A.get(i).intValue() == A.get(k).intValue()) {
            A.remove(k);
        } else {
            k++;
        }
    }
}

Note removing items from a list while iterating over it is not considered good practice in Java but you should be OK here. A safer solution is to add the non-duplicates to a new list.

It's also true that if you're looking for an optimization, you will see a solution around here that looks like this:

Set<Integer> set = new LinkedHashSet<Integer>(list);
list.clear();
list.addAll(set);

I have run benchmarks before and it's about 5x faster than iteration despite creating a helper object.

Radiodef
  • 37,180
  • 14
  • 90
  • 125
  • @Hooter When you are considering accepting an answer, please do note that the `==` comparison is the reason your code in the OP did not work. Here is a longer explanation: http://stackoverflow.com/a/1515811/2891664 – Radiodef Jan 23 '14 at 04:57
  • +1 for the optimization using sets. This is a better answer than mine. – Edward Falk Jan 24 '14 at 19:10
1

First, you don't need to start your inner loop at 0; start at i+1 instead. Otherwise, you're wasting a lot of time comparing items that have already been compared.

Also, I'm assuming the list isn't sorted.

So:

// Note that A.size() is subject to change, so don't
// cache its value in the loop conditional.
for (int i=0; i<A.size()-1; ++i) {
    // In this loop, we don't always increment j, because
    // removing an item from the list will bring a new item
    // into the j'th position
    for (int j=i+1; j<A.size();) {
        if (A.get(j).equals(A.get(i))) {
            A.remove(j)
        } else {
            ++j;
        }
    }
}
Edward Falk
  • 9,991
  • 11
  • 77
  • 112
  • Actually i am using selection sort before I call my removeDuplicates method, I tried this code and it seems to be doing the same thing with the list, around the 60th number in the list it stops removing the duplicates. – Hooter Jan 23 '14 at 02:38
  • If A is sorted, there are other optimizations you can do, but they're not important now. – Edward Falk Jan 23 '14 at 03:16
  • Should be `A.get(j).equals(A.get(i))` or `A.get(j).intValue() == A.get(i).intValue()`...right? – Radiodef Jan 23 '14 at 04:12
  • D'oh! You're right. I was thinking list of int, but it's a list of Integer. I don't think == works with Integer. I'll change my answer. – Edward Falk Jan 23 '14 at 16:21
0

i actually wonder why it would work at all, because you assume that the size of your array is fixed by you are dynamically removing items from it, when you remove items from the List, your List shrinks and what used to be item i is now item i-1, so you keep "skipping" and that's why you can't remove all duplicates the way you are doing it.

loop through the array and add the items to a set, the set by default does not take duplicates so at the end you end-up with unique numbers.

    Set<Integer> hashset = new HashSet<Integer>();
    for (int i = 0; i < A.size(); i++) {
        hashset.add(A.get(i));
    }
    A.clear();
    A.addAll(hashset);
nightograph
  • 2,179
  • 5
  • 31
  • 49
  • by the way, java convention suggests that you name your variables lower case and also those ;s after if/else look strange – nightograph Jan 23 '14 at 04:20
  • My prof had the method heading in the HW and it used "A' in the parameter of the method so that's what i used it, I tried your changes and it does the same thing, removes most of the duplicates but stops towards the end of the list. I will edit and add my full class plus my test class to see if that is any help. – Hooter Jan 23 '14 at 04:30
  • @Hooter, I just added code for solution #2 - try that, it's much more efficient anyways - also if you can give me a sample of 10 items which doesn't work, i have become curious! – nightograph Jan 23 '14 at 04:34
  • This works, and much more efficient. I just had to sort the list again and it displayed correctly. Thank you. – Hooter Jan 23 '14 at 04:43