62

I have an array of objects and I want to concatenate it with another array of objects, except that objects that have same id's. That objects are used in many places in the system and don't have hash code or equals implemented. So I don't want to implement hashCode() and equals(), cause I'm afraid to break something somewhere in the system where that objects are used and I don't know about that.

I want to put all that objects in a set, but somehow make the objects use custom hashCode() and equals(). Something like custom Comparator, but for equals.

jediz
  • 4,459
  • 5
  • 36
  • 41
dhblah
  • 9,751
  • 12
  • 56
  • 92
  • 7
    perhaps you need to create a wrapper class of some sort that does implement equals and hashCode. – Hovercraft Full Of Eels Mar 05 '11 at 13:29
  • I can't do that, there are too many places to edit – dhblah Mar 05 '11 at 13:43
  • 1
    if you're afraid of refactoring then I'd suggest a) writing a bunch of unit tests and b) learning to use your IDE's refactoring tools. If you do that, it's not hard to update all references relatively quickly and safely. This is usually a better approach than trying to force something into your design that won't fit..... – mikera Mar 05 '11 at 14:07
  • you'd only use that wrapper class for the purposes of your Set, not everywhere else. – nos Mar 05 '11 at 14:11
  • please specify what type of elements your array has. It is not obvious from the question. For example "each element has type User with two attributes `id` and `name`".It is important to know, because all primitive wrapper classes (Long, String) in java have comparators already implemented, and usually they are enough for most of the cases of sorting. – Andreas Bakurov Mar 05 '11 at 14:33

8 Answers8

39

Yes it is possible to do such a thing. (And people have done it.) But it won't allow you to put your objects into a HashMap, HashSet, etc. That is because the standard collection classes expect the key objects themselves to provide the equals and hashCode methods. (That is the way they are designed to work ...)

Alternatives:

  1. Implement a wrapper class that holds an instance of the real class, and provides its own implementation of equals and hashCode.

  2. Implement your own hashtable-based classes which can use a "hashable" object to provide equals and hashcode functionality.

  3. Bite the bullet and implement equals and hashCode overrides on the relevant classes.

In fact, the 3rd option is probably the best, because your codebase most likely needs to to be using a consistent notion of what it means for these objects to be equal. There are other things that suggest that your code needs an overhaul. For instance, the fact that it is currently using an array of objects instead of a Set implementation to represent what is apparently supposed to be a set.

On the other hand, maybe there was/is some real (or imagined) performance reason for the current implementation; e.g. reduction of memory usage. In that case, you should probably write a bunch of helper methods for doing operations like concatenating 2 sets represented as arrays.

Stephen C
  • 698,415
  • 94
  • 811
  • 1,216
  • 2
    The trove libraries contain something like option #2, they define a HashingStrategy interface which can be used in certain HashMap instances. They don't completely generalize this, but only implement it for HashMaps based on certain key classes. http://trove4j.sourceforge.net/javadocs/gnu/trove/strategy/HashingStrategy.html – Enwired May 03 '13 at 17:53
  • I have taken the liberty to create an implementation for an EqualityHashSet which uses a normal HashSet as backend and wraps all elements to allow flexible overriding of equals and hashCode. See https://github.com/maiers/equality-collection – pintxo Jun 03 '16 at 22:04
16

90% of the time when a user wants an equivalence relation there is already a more straightforward solution. You want to de-duplicate a bunch of things based on ids only? Can you just put them all into a Map with the ids as keys, then get the values() collection of that?

Kevin Bourrillion
  • 40,336
  • 12
  • 74
  • 87
  • 9
    that doesnt work if you want to pick which properties of the key define equality/hashing and you want to put something else as the value. Your solution only works if the important property of some key is singular, it doesnt work if you want to pick 2+ but not all important properties used by the key in its hashing/equals. – mP. Mar 28 '11 at 10:00
  • if someone writes one up is Guice interested ? – mP. Mar 28 '11 at 10:01
11

HashingStrategy is the concept you're looking for. It's a strategy interface that allows you to define custom implementations of equals and hashcode.

public interface HashingStrategy<E>
{
    int computeHashCode(E object);
    boolean equals(E object1, E object2);
}

As others have pointed out, you can't use a HashingStrategy with the built in HashSet or HashMap. Eclipse Collections includes a set called UnifiedSetWithHashingStrategy and a map called UnifiedMapWithHashingStrategy.

Let's look at an example. Here's a simple Data class we can use in a UnifiedSetWithHashingStrategy.

public class Data
{
    private final int id;

    public Data(int id)
    {
        this.id = id;
    }

    public int getId()
    {
        return id;
    }

    // No equals or hashcode
}

Here's how you might set up a UnifiedSetWithHashingStrategy and use it.

java.util.Set<Data> set =
  new UnifiedSetWithHashingStrategy<>(HashingStrategies.fromFunction(Data::getId));
Assert.assertTrue(set.add(new Data(1)));

// contains returns true even without hashcode and equals
Assert.assertTrue(set.contains(new Data(1)));

// Second call to add() doesn't do anything and returns false
Assert.assertFalse(set.add(new Data(1)));

Why not just use a Map? UnifiedSetWithHashingStrategy uses half the memory of a UnifiedMap, and one quarter the memory of a HashMap. And sometimes you don't have a convenient key and have to create a synthetic one, like a tuple. That can waste more memory.

How do we perform lookups? Remember that Sets have contains(), but not get(). UnifiedSetWithHashingStrategy implements Pool in addition to MutableSet, so it also implements a form of get().

Note: I am a committer for Eclipse Collections.

Craig P. Motlin
  • 26,452
  • 17
  • 99
  • 126
4

Of course you can create some external object providing an equality comparison and a HashCode. But the build-in collections of Java do not use such an object for their comparisons/lookup.

I once did create an interface like this in my package-collection (just freshly translated to english):

public interface HashableEquivalenceRelation {

    /**
     * Returns true if two objects are considered equal.
     *
     * This should form an equivalence relation, meaning it
     * should fulfill these properties:
     *  <ul>
     *    <li>Reflexivity:  {@code areEqual(o, o)}
     *            should always return true.</li>
     *    <li>Symmetry: {@code areEqual(o1,o2) == areEqual(o2,o1)}
     *            for all objects o1 and o2</li>
     *    <li>Transitivity: If {@code areEqual(o1, o2)} and {@code areEqual(o2,o3)},
     *            then {@code areEqual(o1,o3}} should hold too.</li>
     *  </ul>
     * Additionally, the relation should be temporary consistent, i.e. the
     * result of this method for the same two objects should not change as
     * long as the objects do not change significantly (the precise meaning of
     * <em>change significantly</em> is dependent on the implementation).
     *
     * Also, if {@code areEqual(o1, o2)} holds true, then {@code hashCode(o1) == hashCode(o2)}
     * must be true too.
     */
    public boolean areEqual(Object o1, Object o2);

    /**
     * Returns a hashCode for an arbitrary object.
     *
     * This should be temporary consistent, i.e. the result for the same
     * objects should not change as long as the object does not change significantly
     * (with change significantly having the same meaning as for {@link areEqual}).
     *
     * Also, if {@code areEqual(o1, o2)} holds true, then {@code hashCode(o1) == hashCode(o2)}
     * must be true too.
     */
    public int hashCode(Object o);

}

Than I had a group of interfaces CustomCollection, CustomSet, CustomList, CustomMap, etc. defined like the interfaces in java.util, but using such an equivalence relation for all the methods instead of the build-in relation given by Object.equals. I had some default implementations, too:

/**
 * The equivalence relation induced by Object#equals.
 */
public final static EquivalenceRelation DEFAULT =
    new EquivalenceRelation() {
        public boolean areEqual(Object o1, Object o2)
        {
            return
                o1 == o2 ||
                o1 != null &&
                o1.equals(o2);
        }
        public int hashCode(Object ob)
        {
            return
                ob == null?
                0 :
                ob.hashCode();
        }
        public String toString() { return "<DEFAULT>"; }
    };

/**
 * The equivalence relation induced by {@code ==}.
 * (The hashCode used is {@link System#identityHashCode}.)
 */
public final static EquivalenceRelation IDENTITY =
    new EquivalenceRelation() {
        public boolean areEqual(Object o1, Object o2) { return o1 == o2; }
        public int hashCode(Object ob) { return System.identityHashCode(ob); }
        public String toString() { return "<IDENTITY>"; }
    };

/**
 * The all-relation: every object is equivalent to every other one.
 */
public final static EquivalenceRelation ALL =
    new EquivalenceRelation() {
        public boolean areEqual(Object o1, Object o2) { return true; }
        public int hashCode(Object ob) { return 0; }
        public String toString() { return "<ALL>"; }
    };

/**
 * An equivalence relation partitioning the references
 * in two groups: the null reference and any other reference.
 */
public final static EquivalenceRelation NULL_OR_NOT_NULL =
    new EquivalenceRelation() {
        public boolean areEqual(Object o1, Object o2)
        {
            return (o1 == null && o2 == null) ||
                (o1 != null && o2 != null);
        }
        public int hashCode(Object o) { return o == null ? 0 : 1; }
        public String toString() { return "<NULL_OR_NOT_NULL>"; }
    };

/**
 * Two objects are equivalent if they are of the same (actual) class.
 */
public final static EquivalenceRelation SAME_CLASS =
    new EquivalenceRelation() {
        public boolean areEqual(Object o1, Object o2)
        {
            return o1 == o2 || o1 != null && o2 != null &&
                o1.getClass() == o2.getClass();
        }
        public int hashCode(Object o) { return o == null ? 0 : o.getClass().hashCode(); }
        public String toString() { return "<SAME_CLASS>"; }
    };


/**
 * Compares strings ignoring case.
 * Other objects give a {@link ClassCastException}.
 */
public final static EquivalenceRelation STRINGS_IGNORE_CASE =
    new EquivalenceRelation() {
        public boolean areEqual(Object o1, Object o2)
        {
            return o1 == null ?
                o2 == null :
                ((String)o1).equalsIgnoreCase((String)o2);
        }
        public int hashCode(Object o)
        {
            return o == null ? -12345 : ((String)o).toUpperCase().hashCode();
        }
        public String toString() { return "<STRINGS_IGNORE_CASE>"; }
    };


/**
 * Compares {@link CharSequence} implementations by content.
 * Other object give a {@link ClassCastException}.
 */
public final static EquivalenceRelation CHAR_SEQUENCE_CONTENT =
    new EquivalenceRelation() {
        public boolean areEqual(Object o1, Object o2) 
        {
            CharSequence seq1 = (CharSequence)o1;
            CharSequence seq2 = (CharSequence)o2;
            if (seq1 == null ^ seq2 == null) // nur eins von beiden null
                return false;
            if (seq1 == seq2)   // umfasst auch den Fall null == null
                return true;
            int size = seq1.length();
            if (seq2.length() != size)
                return false;
            for (int i = 0; i < size; i++)
                {
                    if (seq1.charAt(i) != seq2.charAt(i))
                        return false;
                }
            return true;
        }
        /**
         * Entrspricht String.hashCode
         */
        public int hashCode(Object o)
        {
            CharSequence sequence = (CharSequence)o;
            if (sequence == null)
                return 0;
            int hash = 0;
            int size = sequence.length();
            for (int i = 0; i < size; i++)
                {
                    hash = hash * 31 + sequence.charAt(i);
                }
            return hash;
        }
    };
Paŭlo Ebermann
  • 73,284
  • 20
  • 146
  • 210
  • @Paulo, the intf you have designed solves the problem , now we need a Map that uses it! – mP. Mar 28 '11 at 10:01
  • @mP: Yeah, I reimplemented most of the collection library to use these interfaces instead of `equals`/`hashCode`. – Paŭlo Ebermann Jun 02 '11 at 00:12
  • 1
    I have done a similar thing, but called it `EqualitySemantic`, because an implementation gives a new definition of the equality of type `T`. Also note that it is good practice to use the type parameter `T` for the first parameter of `equals` and `hashCode` methods, to make clear that this equality semantic only applies for instances of type `T`. – SpaceTrucker Oct 17 '12 at 08:31
  • @SpaceTrucker: Good point. I invented this thing before the introduction of generics into Java, but it would be better with a type parameter. – Paŭlo Ebermann Oct 17 '12 at 08:49
3

Use Guava Equivalence:

Equivalence<T> equivalence = new Equivalence<T>() {
    @Override
    protected boolean doEquivalent(T a, T b) {
        return CustomComparator.equals(a, b);
    }

    @Override
    protected int doHash(T item) {
        return CustomHashCodeGenerator.hashCode(item);
    }
};
List<T> items = getItems();
Set<Equivalence.Wrapper<T>> setWithWrappedObjects = items.stream()
    .map(item -> equivalence.wrap(item))
    .collect(Collectors.toSet());
izogfif
  • 6,000
  • 2
  • 35
  • 25
  • Assuming the custom ID comparison is an equality comparison on an `id` property of an object of type `TheObjectType`, the equivalence can be simplified to `Equivalence equivalence = Equivalence.equals().onResultOf(TheObjectType::getId);` – M. Justin Jan 04 '21 at 07:55
1

Just had this problem and worked up a simple solution. Not sure how memory-intensive it is; I'm sure people can refine it down the line.

When the Comparator returns 0, the elements match.

public static <E> Set<E> filterSet(Set<E> set, Comparator<E> comparator){
    Set<E> output = new HashSet<E>();
    for(E eIn : set){
        boolean add = true;
        for(E eOut : output){
            if(comparator.compare(eIn, eOut) == 0){
                add = false;
                break;
            }
        }
        if(add) output.add(eIn);
    }
    return output;
}

My use case was that I needed to filter out duplicate URLs, as in URLs that point to the same document. The URL object has a samePage() method that will return true if everything except the fragment are the same.

filtered = Misc.filterSet(filtered, (a, b) -> a.sameFile(b) ? 0 : 1);
ndm13
  • 1,189
  • 13
  • 19
1

Would using a TreeSet help here? A TreeSet actually performs ordering and Set based behavior using compare/compareTo and allows you to define a custom Comparator for use in one of the constructors.

whaley
  • 16,075
  • 10
  • 57
  • 68
0

You will not succeed doing your de-duplicating concatenation with a Comparator. Presumably you are looking to do something like this:

List<Object> list = new ArrayList<Object>();
list.addAll( a );
list.addAll( b );
Collections.sort( list, new MyCustomComparator() );

The problem is that Comparator needs to compare not just for equals/not-equals, but also for relative order. Given objects x and y that are not equal, you have to answer if one is greater than the other. You will not be able to do that, since you aren't actually trying to compare the objects. If you don't give a consistent answer, you will send the sorting algorithm into an infinite loop.

I do have a solution for you. Java has a class called LinkedHashSet, whose virtue is that it doesn't allow duplicates to be inserted, but maintains insertion order. Rather than implementing a comparator, implement a wrapper class to hold the actual object and implement hashCode/equals.

Konstantin Komissarchik
  • 28,879
  • 6
  • 61
  • 61