0

I would like to eliminate duplicate key combinations stored in a list of lists and wanted suggestions on an efficient way to remove or mark duplicates. Let me explain the problem with an easy example. I have a list containing parts of name as separate elements in a list. A person can have 2 - n number of parts in his name.

Elements in a basic list contains parts of person's name and can appear in any order, in this case it has three parts { "Rajesh", "Kumar", "Singh" }. Similarly there can be a list of people names with their names appearing in any order as below

0 = { "Rajesh", "Kumar", "Singh" }
1 = { "William", "Robert" }
2 = { "John", "Anderson", "Jr" }
3 = { "Kumar", "Rajesh", "Singh" }

Item number 3 in above list needs to be eliminated as it has exactly 3 items in it and the parts match with item 0 though their order of appearance is different.

Thank you

  • 4
    If you just used sets both for the inner collections and the outer one, you'd deal with both the equality check and the "order doesn't matter" problems at the same time. And that would also take care of eliminating duplicates... – ernest_k May 29 '18 at 19:35
  • Possible duplicate of [Java ArrayList - how can I tell if two lists are equal, order not mattering?](https://stackoverflow.com/questions/13501142/java-arraylist-how-can-i-tell-if-two-lists-are-equal-order-not-mattering) – Turamarth May 29 '18 at 19:35

4 Answers4

1

Store the elements in a Set<Set<String>>. Sets are unordered, so lookups don't care about the original order within a group. Set.equals:

Compares the specified object with this set for equality. Returns true if the specified object is also a set, the two sets have the same size, and every member of the specified set is contained in this set (or equivalently, every member of this set is contained in the specified set). This definition ensures that the equals method works properly across different implementations of the set interface.

jspcal
  • 50,847
  • 7
  • 72
  • 76
0

One approach to doing this efficiently would be writing a wrapper class for your collection, and defining a custom hashing strategy for it:

class KeyCombinationWrapper {
    private final List<String> keys;
    public KeyCombinationWrapper(List<String> keys) {
        this.keys = new ArrayList<>(keys);
    }
    public List<String> getKeys() {
        return keys;
    }
    @Override
    public boolean equals(Object obj) {
       // Compare lists of keys in an order-independent way, for example,
       // by calling containsAll on both sides.
       ...
    }
    @Override
    public int hashCode() {
        // Hash code must be computed in an order-independent way,
        // for example by adding up or XOR-ing hash codes of individual keys.
        ...
    }
}

Add KeyCombinationWrapper objects for each key set into HashSet<KeyCombinationWrapper>, then harvest the unique keys from the wrappers that remain in the final set.

Implementation details of the two methods is up to you, as long as you follow the guidelines.

Sergey Kalinichenko
  • 714,442
  • 84
  • 1,110
  • 1,523
0

You can use the String.hashCode() method to summarize each part as a long, and then xor the value for each part to get a summary that represents all the parts without regard to order.

Then sweep through your list, keeping a HashMap of the combined hash and value as you encounter each element. If the combined hash key is already present, decide whether to keep the new or old one.

Dump the values from the HashMap when done.

Bob Jacobsen
  • 1,150
  • 6
  • 9
0

I would suggest that you make a reverse map of words and their indexes.

Map<String, Integer[]> should work.

For every index, you check if all the words exist in your map with the same values for the index(es). If they do, then this is a repeated name and you ignore this. If not, you update the map.

This will be of the order O(numNames*totalWordsinNames)

Yuvraj Jaiswal
  • 1,605
  • 13
  • 20