Implementation of equals() and hashCode() when no natural key is available?

Question

This question is basically a follow-up to questions:

Should I write equals() methods in JPA entities? and What is the best practice when implementing equals() for entities with generated ids

Some background first...

You can regularly encounter the following primary key constellations:

Natural keys (business keys): usually a set of real, multi-column attributes of the entity
Artificial keys (surrogate keys): meaningless, usually auto-increment (IDENTITY, AUTO_INCREMENT, AUTOINCREMENT, SEQUENCE, SERIAL, ...) IDs
Hybrid keys (semi-natural/semi-artificial keys): usually consisting of an artificial ID and some additional, natural column/s, e.g any table that references another table which uses an ID and extends that key (entity_id, ordinal_nbr) or similar.

Frequent scenario: many-to-one references to a root, branch, or leaf inheritance table, which all share a common, "stupid" ID via identifying relationship/dependent key. Root (and branch) tables often make sense when another table needs to reference all entity types, e.g. PostAddresses -> Contacts, where Contacts has sub tables Persons, Clubs, and Facilities, which have nothing in common but being "contactable".

Now to JPA:

In Java, we can create new entity objects whose PK may be incomplete (null or partly null), an entity (row) that a DBMS would ultimately prevent us from being inserted into the DB.

However, when working with application code, it's often handy to have new (or detached) entities that can be compared to existing (managed) entities, even if the new entity objects don't have a PK value yet. To achieve this for any entities that have natural key columns, use them for implementing equals() and hashCode() (as suggested by the other two SO postings).

Question:

But what do you do when no natural/business key can be determined, as in the case of the Contacts table, which is basically just an ID (plus a discriminator)? What would be a good column selection policy for basing equals() and hashCode() implementations on? (artificial keys 2. and 3. above)

There's obviously not much of a choice...

One (naive) goal would be to achieve the same "transient comparability". Can it be done? If not, what does the general approach look like for artificial ID equals() and hashCode() implementations?

Note: I'm already using Apache EqualsBuilder and HashCodeBuilder... I have intentionally "naivified" my question.

score 3 · Accepted Answer · answered Jul 08 '11 at 23:41

I think the subject is more simpler than the discussions point to.

Take the database id(s) if present, otherwise use Object#equals / object identity

Why? If you put a new entity into database JPA does nothing else than mapping a new generated id from database to the entities objects identity. This means on the other hand, that the object identity is a primary key beforehand, too.

The point of the discussion often seems to be the assumption, that two business object with same properties are equal. But they are not. E.g. two addresses with same street and city are only equal if you dont want to have duplicates of address values. But then you make them to a primary key within the database too which leads to the fact that you got the primary keys always for your business objects. If you allow duplicate addresses for your business objects the objects identity is the primary key, since it is the only distinction between two addresses.

After persiting an entity the database id does take the job completely since you can now have clones of the same entity which only shares the same database id. (But now can have several memory locations / objects identities)

score 1 · Answer 2 · answered Jul 08 '11 at 23:16

1

If you can't find a set of properties on the object that will distinguish it from other objects of the same kind then you can't compare those objects, can you? If you provide detailed use case there may be more to it but in case of contact with id and discriminator, in the absence of id you can only compare groups of objects that have the same discriminator. And if groups are guaranteed to only have one element, then discriminator is your key.

answered Jul 08 '11 at 23:16

Alex Gitelman

24,429
7
52
49

Note the discriminator column can't be used as it's not identifying. It is set to the type of sub entity that references super. There can be many entities for each sub type Persons, Clubs, and Facilities sharing the same value. To me it appears that the answer can only be to use the contact ID then. **Maybe** the sub entities provide something, I haven't checked that yet. – Kawu Jul 09 '11 at 00:01
Also note Contacts.id **is** available. Is there really no more to it, but throw this ID into the Apache builders? – Kawu Jul 09 '11 at 00:21

score 1 · Answer 3 · answered Jul 08 '11 at 23:21

One of the commonly suggested techniques is to use UUIDs for identifiers, which have a couple of downsides.

They make for ugly urls, and supposedly there are performance implications of querying entities based on such a long identifier. The long UUIDs also cause your database indexes to become too large.

The advantage of UUIDs is that you don't have to implement a separate hashCode() equals() method for every entity.

The solution I've decided to use in my own projects, is to mix a traditional assigned identifier and also use a UUID internally for the hashCode() equals() methods. It looks something like this:

@Configurable
@MappedSuperclass
@EntityListeners({ModelListener.class})
@SuppressWarnings("serial")
public abstract class ModelBase implements Serializable {

     //~~ Instance Fields =====================================

    @Id
    @GeneratedValue(strategy=GenerationType.IDENTITY)
    @Column(name = "id", nullable = false, updatable=false, unique=true)
     protected Long id;

    @Column(name="__UUID__", unique=true, nullable=false, updatable=false, length = 36)
    private String uuid = java.util.UUID.randomUUID().toString();

    //~ Business Methods =====================================

    @Override
    public String toString() {
        return new ToStringCreator(this)
            .append("id", getId())
            .append("uuid", uuid())
            .append("version", getVersion())
             .toString(); 
    }

    @Override
    public int hashCode() {
        return uuid().hashCode();
    }

    @Override
    public boolean equals(Object o) {
        return (o == this || (o instanceof ModelBase && uuid().equals(((ModelBase)o).uuid())));
     }

    /**
     * Returns this objects UUID.
     * 
     * @return - This object's UUID.
     */
    public String uuid() {
        return uuid;
    }

    //~ Accessor Methods ======================================

    public Long getId() {
        return id;
    }

    @SuppressWarnings("unused")
    private void setId(Long id) {
        this.id = id;
    }

     @SuppressWarnings("unused")
    private String getUuid() {
        return uuid;
    }

    @SuppressWarnings("unused")
    private void setUuid(String uuid) {
        this.uuid = uuid;
     }
}

Just extend ModelBase for all of your entities. The advantage of this technique is that the uuid is assigned as soon as the object is created. But we still have an assigned id we can use in our application code to query specific objects. Basically, the uuid field is never used or even thought about in our application code except for comparison purposes. Works like a charm.

I think the issue is `'it's often handy to have new (or detached) entities that can be compared to existing (managed) entities'`. So how would you compare those new UUIDs to existing objects? — Alex Gitelman, Jul 08 '11 at 23:23
I see your point, but in practice I haven't run into any use-cases for this. I have based my answer on the original question: "But what do you do when no natural/business key can be determined, as in the case of the Contacts table, which is basically just an ID (plus a discriminator)? What would be a good column selection policy for basing equals() and hashCode() implementations on? (artificial keys 2. and 3. above) There's obviously not much of a choice...". My solution solves this issue. — Nobody, Jul 08 '11 at 23:28
I didn't see good use case either, that's why I want it to be clarified. — Alex Gitelman, Jul 08 '11 at 23:28
I don't want to have all my entities extend another class, also because not all of my entities use dumb (UU)IDs. Most PKs are in fact natural. I just can't help to introduce IDs for inheritance relationships. This has the consequence of mixed-type keys as described in 3. "hybrid keys". — Kawu, Jul 09 '11 at 00:15
I think you can have the uuid better marked as transient. if you don't want entities to extends a base class you can use aspectj ITD to add any wanted behaviour. — gpilotino, Feb 02 '12 at 09:48
@gpilotino: That's a great idea and I hadn't thought of making the UUID transient. If the database generated id is present, use that. Otherwise, use the transient UUID. That way you don't have to store the UUID. I think I'm off to change my model objects. =) — Nobody, Jul 17 '12 at 18:17

Implementation of equals() and hashCode() when no natural key is available?

3 Answers3