3

I have an ArrayList<MyObject> that may (or may not) contain duplicates of MyObject I need to remove from the List. How can I do this in a way that I don't have to check duplication twice as I would do if I were to iterate the list in two for-loops and cross checking every item with every other item.

I just need to check every item once, so comparing A:B is enough - I don't want to compare B:A again, as I already did that.

Furthermore; can I just remove duplicates from the list while looping? Or will that somehow break the list and my loop?

Edit: Okay, I forgot an important part looking through the first answers: A duplicate of MyObject is not just meant in the Java way meaning Object.equals(Object), but I need to be able to compare objects using my own algorithm, as the equality of MyObjects is calculated using an algorithm that checks the Object's fields in a special way that I need to implement!

Furthermore, I can't just override euqals in MyObject as there are several, different Algorithms that implement different strategies for checking the equality of two MyObjects - e.g. there is a simple HashComparer and a more complex EuclidDistanceComparer, both being AbstractComparers implementing different algorithms for the public abstract boolean isEqual(MyObject obj1, MyObject obj2);

F.P
  • 17,421
  • 34
  • 123
  • 189

5 Answers5

4

Sort the list, and the duplicates will be adjacent to each other, making them easy to identify and remove. Just go through the list remembering the value of the previous item so you can compare it with the current one. If they are the same, remove the current item.

And if you use an ordinary for-loop to go through the list, you control the current position. That means that when you remove an item, you can decrement the position (n--) so that the next time around the loop will visit the same position (which will now be the next item).

You need to provide a custom comparison in your sort? That's not so hard:

Collections.sort(myArrayList, new Comparator<MyObject>() {

    public int compare(MyObject o1, MyObject o2) {
        return o1.getThing().compareTo(o2.getThing());
    }
});

I've written this example so that getThing().compareTo() stands in for whatever you want to do to compare the two objects. You must return an integer that is zero if they are the same, greater than 1 if o1 is greater than o2 and -1 if o1 is less than o2. If getThing() returned a String or a Date, you'd be all set because those classes have a compareTo method already. But you can put whatever code you need to in your custom Comparator.

Daniel Earwicker
  • 114,894
  • 38
  • 205
  • 284
  • That looks nice - but I can only decide wether the objects are the same or not - what should I return if they are not? -1 or 1? – F.P Jan 17 '12 at 10:20
  • What is the comparison you are performing between the objects? – Daniel Earwicker Jan 17 '12 at 10:25
  • I depends on if o1 should be "before" o2 or not. See the javadoc for Comparator. – ante Jan 17 '12 at 10:25
  • @ante - all that is required here is for equal objects to be next to each other. See the question! :) – Daniel Earwicker Jan 17 '12 at 10:28
  • @ante I do understand the concept - I just can't decide wether an object should be placed before or after anoter because the question of sorting is invalid in my case. There is only equality or not - I can't answer the question "before or after?". – F.P Jan 17 '12 at 10:31
  • @Florian Peschka - what comparison are you performing between the objects? – Daniel Earwicker Jan 17 '12 at 10:34
  • Then you need to decide on some stable ordering just for this sort operation. System.identityHashCode might be a option. – ante Jan 17 '12 at 10:35
  • @DanielEarwicker I compare them using different algorithms that are only able to decide wether objects are considered equal or not - As I want to be able to extend these algorithms later, I can't really say what exactly the will be using to calculate the equality. May be distances, may be hashes... – F.P Jan 17 '12 at 10:44
  • @Florian - then you can compare the coordinates to get a stable comparison, right? If they are equal by your fuzzy comparison, return zero. Otherwise if `x1 != x2` then return `x1 - x2`. Otherwise return `y1 - y2`. – Daniel Earwicker Jan 17 '12 at 10:47
  • If you have a framework already written such that you only have a boolean `isEqual` and you don't want to rewrite it, you can write a custom comparison that first checks `isEqual` and returns 0 if it is true, but otherwise falls back to comparing the two objects' `System.identityHashCode` values (as suggested by @ante above). – Daniel Earwicker Jan 17 '12 at 10:49
  • NB. My comment about coordinates only makes sense based on the original version of your comment above... The last one still applies though. – Daniel Earwicker Jan 17 '12 at 10:51
  • That's what I will stick to I guess - after all, sorting doesn't matter - so I'll just sort after the identityHash. Nice ideas guys! – F.P Jan 17 '12 at 11:51
4

Create a set and it will remove the duplicates automatically for you if the ordering is not important.

Set<MyObject> mySet = new HashSet<MyObject>(yourList);
fmucar
  • 14,361
  • 2
  • 45
  • 50
2

Instantiate a new set-based collection HashSet. Don't forget to implement equals and hashcode for MyObject.

Good Luck!

aviad
  • 8,229
  • 9
  • 50
  • 98
2

If object order is insignificant

If the order is not important, you can put the elements of the list into a Set:

Set<MyObject> mySet = new HashSet<MyObject>(yourList);

The duplicates will be removed automatically.

If object order is significant

If ordering is significant, then you can manually check for duplicates, e.g. using this snippet:

// Copy the list.
ArrayList<String> newList = (ArrayList<String>) list.clone();

// Iterate
for (int i = 0; i < list.size(); i++) {
    for (int j = list.size() - 1; j >= i; j--) {
        // If i is j, then it's the same object and don't need to be compared.
        if (i == j) {
            continue;
        }
        // If the compared objects are equal, remove them from the copy and break
        // to the next loop
        if (list.get(i).equals(list.get(j))) {
            newList.remove(list.get(i));
            break;
        }
        System.out.println("" + i + "," + j + ": " + list.get(i) + "-" + list.get(j));
    }
}

This will remove all duplicates, leaving the last duplicate value as original entry. In addition, it will check each combination only once.

Using Java 8

Java Streams makes it even more elegant:

List<Integer> newList = oldList.stream()
    .distinct()
    .collect(Collectors.toList());

If you need to consider two of your objects equal based on your own definition, you could do the following:

public static <T, U> Predicate<T> distinctByProperty(Function<? super T, ?> propertyExtractor) {
    Set<Object> seen = ConcurrentHashMap.newKeySet();
    return t -> seen.add(propertyExtractor.apply(t));
}

(by Stuart Marks)

And then you could do this:

List<MyObject> newList = oldList.stream()
    .filter(distinctByProperty(t -> {
        // Your custom property to use when determining whether two objects
        // are equal. For example, consider two object equal if their name
        // starts with the same character.
        return t.getName().charAt(0);
    }))
    .collect(Collectors.toList());

Futhermore

You cannot modify a list while an Iterator (which is usually used in a for-each loop) is looping through an array. This will throw a ConcurrentModificationException. You can modify the array if you are looping it using a for loop. Then you must control the iterator position (decrementing it while removing an entry).

MC Emperor
  • 22,334
  • 15
  • 80
  • 130
0

Or http://docs.oracle.com/javase/6/docs/api/java/util/SortedSet.html if you need sort-order..

EDIT: What about deriving from http://docs.oracle.com/javase/6/docs/api/java/util/TreeSet.html, it will allow you to pass in a Comparator at construction time. You override add() to use your Comparator instead of equals() - this will give you the flexibility of creating different sets that are ordered according to your Comparator and they will implement your "Equality"-Strategy.

Dont forget about equals() and hashCode() though...

quaylar
  • 2,617
  • 1
  • 17
  • 31