2

I am really too confused with the equals() and hashCode() methods after reading lots of documentation and articles. Mainly, there are different kind of examples and usages that makes me too confused.

So, could you clarify me about the following points?

1. If there is not any unique field in an entity (except from id field) then should we use getClass() method or only id field in the equals() method as shown below?

@Override
public boolean equals(Object o) {
   if (this == o) return true;
   if (getClass() != o.getClass()) return false;
   
   // code omitted
}

2. If there is a unique key e.g. private String isbn;, then should we use only this field? Or should we combine it with getClass() as shown below?

@Override
public boolean equals(Object o) {
   if (this == o) return true;
   if (getClass() != o.getClass()) return false;
   Book book = (Book) o;
   return isbn == book.isbn;
}

3. What about NaturalId? As far as I understood, it is used for unique fields e.g. private String isbn;. What is the purpose of its usage? Is it related to equals() and hashCode() methods?

Alexander Ivanchenko
  • 25,667
  • 5
  • 22
  • 46

3 Answers3

1

It all boils down to what your class actually represents, what is its identity and when should the JVM consider two objects as actually the same. The context in which the class is used determines its behavior (in this case - equality to another object).

By default Java considers two given objects "the same" only if they are actually the same instance of a class (comparison using ==). While it makes sense in case of strictly technical verification, Java applications are usually used to represent a business domain, where multiple objects may be constructed, but they should still be considered the same. An example of that could be a book (as in your question). But what does it mean that a book is the same as another?

See - it depends.

When you ask someone if they read a certain book, you give them a title and the author, they try to "match" it agains the books they've read and see if any of them is equal to criteria you provided. So equals in this case would be checking if the title and the author of a given book is the same as the other. Simple.

Now imagine that you're a Tolkien fan. If you were Polish (like me), you could have multiple "Lord of the Rings" translations available to read, but (as a fan) you would know about some translators that went a bit too far and you would like to avoid them. The title and the author is not enough, you're looking for a book with a certain ISBN identifier that will let you find a certain edition of the book. Since ISBN also contains information about the title and the author, it's not required to use them in the equals method in this case.

The third (and final) book-related example is related to a library. Both situations described above could easily happen at a library, but from the librarian point of view books are also another thing: an "item". Each book in the library (it's just an assumption, I've never worked with such a system) has it's own identifier, which can be completely separate from the ISBN (but could also be an ISBN plus something extra). When you return a book in the library it's the library identifier that matters and it should be used in this case.

To sum up: a Book as an abstraction does not have a single "equality definition". It depends on the context. Let's say we create such set of classes (most likely in more than one context):

  • Book
  • BookEdition
  • BookItem
  • BookOrder (not yet in the library)

Book and BookEdition are more of a value object, while BookItem and BookOrder are entities. Value objects are represented only by their values and even though they do not have an identifier, they can be equal to other ones. Entities on the other hand can include values or can even consist of value objects (e.g. BookItem could contain a BookEdition field next to its libraryId field), but they have an identifier which defines whether they are the same as another (even if their values change). Books are not a good example here (unless we imagine reassigning a library identifier to another book), but a user that changed their username is still the same user - identified by their ID.


In regard to checking the class of the object passed to the equals method - it is highly advised (yet not enforced by the compiler in any way) to verify if the object is of given type before casting it to avoid a ClassCastException. To do that instanceof or getClass() should be used. If the object fulfills the requirement of being of an expected type you can cast it (e.g. Book other = (Book) object;) and only then can you access the properties of the book (libraryId, isbn, title, author) - an object of type Object doesn't have such fields or accessors to them.

You're not explicitly asking about that in your question, but using instanceof and getClass() can be similarly unclear. A rule of thumb would be: use getClass() as it helps to avoid problems with symmetry.


Natural IDs can vary depending on a context. In case of a BookEdition an ISBN is a natural ID, but in case of just a Book it would be a pair of the title and the author (as a separate class). You can read more about the concept of natural ID in Hibernate in the docs.

It is important to understand that if you have a table in the database, it can be mapped to different types of objects in a more complex domain. ORM tools should help us with management and mapping of data, but the objects defined as data representation are (or rather: usually should be) a different layer of abstraction than the domain model.

Yet if you were forced to use, for example, the BookItem as your data-modeling class, libraryId could probably be an ID in the database context, but isbn would not be a natural ID, since it does not uniquely identify the BookItem. If BookEdition was the data-modeling class, it could contain an ID autogenerated by the database (ID in the database context) and an ISBN, which in this case would be the natural ID as it uniquely identifies a BookEdition in the book editions context.

To avoid such problems and make the code more flexible and descriptive, I'd suggest treating data as data and domain as domain, which is related to domain-driven design. A natural ID (as a concept) is present only on the domain level of the code as it can vary and evolve and you can still use the same database table to map the data into those various objects, depending on the business context.


Here's a code snippet with the classes described above and a class representing a table row from the database.

Data model (might be managed by an ORM like Hibernate):

// database table representation (represents data, is not a domain object)
// getters and hashCode() omitted in all classes for simplicity

class BookRow {

    private long id;
    private String isbn;
    private String title;
    // author should be a separate table joined by FK - done this way for simplification
    private String authorName;
    private String authorSurname;
    // could have other fields as well - e.g. date of addition to the library
    private Timestamp addedDate;

    @Override
    public boolean equals(Object object) {
        if (this == object) {
            return true;
        }
        if (object == null || getClass() != object.getClass()) {
            return false;
        }
        BookRow book = (BookRow) object;
        // id identifies the ORM entity (a row in the database table represented as a Java object)
        return id == book.id;
    }
}

Domain model:

// getters and hashCode() omitted in all classes for simplicity

class Book {

    private String title;
    private String author;

    @Override
    public boolean equals(Object object) {
        if (this == object) {
            return true;
        }
        if (object == null || getClass() != object.getClass()) {
            return false;
        }
        Book book = (Book) object;
        // title and author identify the book
        return title.equals(book.title)
               && author.equals(book.author);
    }

    static Book fromDatabaseRow(BookRow bookRow) {
        var book = new Book();
        book.title = bookRow.title;
        book.author = bookRow.authorName + " " + bookRow.authorSurname;
        return book;
    }
}

class BookEdition {

    private String title;
    private String author;
    private String isbn;

    @Override
    public boolean equals(Object object) {
        if (this == object) {
            return true;
        }
        if (object == null || getClass() != object.getClass()) {
            return false;
        }
        BookEdition book = (BookEdition) object;
        // isbn identifies the book edition
        return isbn.equals(book.isbn);
    }

    static BookEdition fromDatabaseRow(BookRow bookRow) {
        var edition = new BookEdition();
        edition.title = bookRow.title;
        edition.author = bookRow.authorName + " " + bookRow.authorSurname;
        edition.isbn = bookRow.isbn;
        return edition;
    }
}

class BookItem {

    private long libraryId;
    private String title;
    private String author;
    private String isbn;

    @Override
    public boolean equals(Object object) {
        if (this == object) {
            return true;
        }
        if (object == null || getClass() != object.getClass()) {
            return false;
        }
        BookItem book = (BookItem) object;
        // libraryId identifies the book item in the library system
        return libraryId == book.libraryId;
    }

    static BookItem fromDatabaseRow(BookRow bookRow) {
        var item = new BookItem();
        item.libraryId = bookRow.id;
        item.title = bookRow.title;
        item.author = bookRow.authorName + " " + bookRow.authorSurname;
        item.isbn = bookRow.isbn;
        return item;
    }
}
Jonasz
  • 1,617
  • 1
  • 13
  • 19
  • If think that in case of ORM class represents a db table and instance represents a row in that table it simplifies everything. – Andrey B. Panfilov Jul 05 '22 at 14:11
  • ORM classes are only a representation of the data saved in the store and instances of such classes are usually wrapped in appropriate proxies handling the IDs, equality etc. on the data level. The domain code is based on the domain model, where it's not that simple, as there are various contexts in which the same data can be interpreted differently. – Jonasz Jul 05 '22 at 14:23
  • Thanks for detailed explanations, voted up. But I would prefer you just replied my 3 questions instead of giving another examples. –  Jul 05 '22 at 15:30
  • I like this answer, but I believe it's an error for `BookItem` and `BookOrder` to extend `BookEdition`. The `BookEdition` is an attribute of a `BookItem`. Can you provide realistic examples where entities usefully form an inheritance hierarchy? I would suggest that this is atypical, and that if entities did use inheritance, but subclasses had different notions of equality than their parent, something is fishy and you probably have a problem in your model. In other words, if `getClass()` seems necessary, it's symptomatic of another problem. – erickson Jul 05 '22 at 20:29
  • @erickson thank you. You are right. I've tried to make connection to the `getClass()`, used inheritance first and later on decided it did not make any sense, hence usage of composition in the next paragraph, yet I did not fix the description above. Fixed it now, thank you for pointing that out! – Jonasz Jul 06 '22 at 05:39
  • @Jonathan I tried to make more connection to the questions you've asked in the edit. Please, see if that makes it more clear. It's not an easy concept and is often treated otherwise. Certain shortcuts are unfortunately taken - one of which could be an annotation like `NaturalId` in an ORM, which may work in some contexts, but it represents a concept of a different level (domain and not data) in my opinion. – Jonasz Jul 06 '22 at 05:42
  • @Jonasz You right, but why don't you post example as an update to your answer for the following scenarios? >>> –  Jul 06 '22 at 08:38
  • **1.** When we don't use Hibernate and there is only id field as unique (we also have title and author fields). –  Jul 06 '22 at 08:38
  • **2.** When we use Hibernate and there is only id field as unique (we also have title and author fields). –  Jul 06 '22 at 08:38
  • **3.** When we don't use Hibernate and there is a unique field e.g. isbn besides pk field (id). –  Jul 06 '22 at 08:38
  • @Jonathan I've added a code snippet, please, see if it's more clear now. – Jonasz Jul 06 '22 at 09:10
  • Thanks a lot, it is clear but not obvious the difference regarding to that if Hibernate is used or not in the entities. –  Jul 06 '22 at 11:03
  • Only `BookRow` would be managed by an ORM (for example Hibernate) as it's the only strictly data-related class, the rest are domain classes and could be mapped from the data class - in the snippet it is done in the static factory methods in each domain class. – Jonasz Jul 06 '22 at 11:13
-1
  1. If there is not any unique field in an entity (except from id field) then should we use getClass() method or only id field in the equals() method as shown below?
@Override
public boolean equals(Object o) {
  if (this == o) return true;
  if (getClass() != o.getClass()) return false;
  
  // code omitted
}

we achieve two following goals when comparing classes in #equals implementation:

  1. thus we make sure that we do not compare apples with oranges (it could be correct though)
  2. the code you omitted must perform cast of Object o to some known class, otherwise we will unable to extract required information from Object o, so, we make #equals method safe - nobody expect to get ClassCastException when calling Set#add for example. Using instanceof there seems not to be a good idea because it violates symmetric and transitive contracts of equals.

Also it is worth noticing that calling o.getClass() could cause unexpected behaviour when Object o is a proxy, some people prefer to either call Hibernate.getClass(o) instead or implement other tricks.

I am really too confused with the equals() and hashCode() methods after reading lots of documentation and articles. Mainly, there are different kind of examples and usages that makes me too confused

  1. If there is a unique key e.g. private String isbn;, then should we use > only this field? Or should we combine it with getClass() as shown below?
@Override
public boolean equals(Object o) {
   if (this == o) return true;
   if (getClass() != o.getClass()) return false;
   Book book = (Book) o;
   return isbn == book.isbn;
}

That is very controversial topic, below are some thoughts on the problem:

  1. it is a good idea to maintain PK column for each DB table - it costs almost nothing, but simplifies a lot of things - imagine someone asked you to delete some rows and instead of delete from tbl where id=... you need to write delete from tbl where field1=... and field2=... and ...
  2. PK's should not be composite, otherwise you might get surprised with queries like select count(distinct field1, field2) from tbl
  3. the argument that entities get their IDs only when get stored in DB that is why we can't rely or surrogate ids in equals and hashCode is just wrong, yes, it is a common situation/behaviour for the most JPA projects, but you always has an option to generate and assign IDs manually, some examples below:
    • EclipseLink UserGuide: "By default, the entities Id must be set by the application, normally before the persist is called. A @GeneratedValue can be used to have EclipseLink generate the Id value." - I believe it is clear enough that @GeneratedValue is just an extra feature and nobody prevents you from creating own object factory.
    • Hibernate User Guide: "Values for simple identifiers can be assigned, which simply means that the application itself will assign the value to the identifier attribute prior to persisting the entity."
    • some popular persistent storages (Cassandra, MongoDB) do not have out-of-the-box auto-increment functionality, however nobody may say those storages do not allow to implement some high level ideas like DDD, etc.
  4. in such discussions examples make sense but book/author/isbn is not the good one, below are something more practical: my db contains about 1000 tables, and just 3 of them contains something similar to natural id, please give me the reason why I should not use surrogate ids there
  5. it is not always possible to use natural ids even when they exist, some examples:
    • bank card PAN - it seems to be unique, however you must not even store it in DB (I believe SSN, VIN are also security sensitive)
    • no matter what anyone says, thinking that natural ids never change is too naive, surrogate ids never change
    • they may have bad format: too long, case insensitive, contains unsafe symbols, etc
  6. it is not possible to implement soft deletes feature when we are using natural ids

PS. Vlad Mihalcea had provided amusing implementation of hashCode:

    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
 
        if (!(o instanceof Book))
            return false;
 
        Book other = (Book) o;
 
        return id != null &&
               id.equals(other.getId());
    }
 
    @Override
    public int hashCode() {
        return getClass().hashCode();
    }

In regard to HBN documentation, the problem is their synthetic cases have nothing in common with the real world. Let's consider their dummy author/book model and try to extend it... Imagine I'm a publisher and I want to keep records of my authors, their books and drafts. What is the difference between book and draft? Book has isbn assigned, draft has not, but draft may one time become a book (or may not). How to keep java equals/hashCode contracts for drafts in such case?

Andrey B. Panfilov
  • 4,324
  • 2
  • 12
  • 18
  • You're showing only one side, I see mention of drawbacks of using surrogate `id` in the `equals/hashCode` implementation. The same author Vlad Mihalcea that you're referring to, has a recent article about the [*usage of Natural id with Hibernate*](https://vladmihalcea.com/the-best-way-to-map-a-naturalid-business-key-with-jpa-and-hibernate/). His conclusion: `The @NaturalId annotation is a very useful Hibernate feature that allows you to retrieve entities by their natural business key without even hitting the database.` – Alexander Ivanchenko Jul 05 '22 at 22:21
  • application generated ids are more powerful than you might think, if you don't like that idea it does not mean it is wrong. – Andrey B. Panfilov Jul 06 '22 at 00:44
  • Actually, I had read some articles of that Author and also confused with them. Because, he gives example as it is a general rule, but as far as I see, it depends on the situation e.g. using Hibernate or not. So, could you please post an update by considering 3 situations? >>> –  Jul 06 '22 at 08:17
  • **1.** When we don't use Hibernate and there is only id field as unique (we also have title and author fields). –  Jul 06 '22 at 08:20
  • **2.** When we use Hibernate and there is only id field as unique (we also have title and author fields). –  Jul 06 '22 at 08:21
  • **3.** When we don't use Hibernate and there is a unique field e.g. isbn besides pk field (id). –  Jul 06 '22 at 08:21
  • By the way, I voted up for your helps. –  Jul 06 '22 at 08:21
-3

getClass()

In regard to the usage of getClass() everything is straightforward.

Method equals() expects an argument of type Object.

It's important to ensure that you're dialing with an instance of the same class before performing casting and comparing attributes, otherwise you can end up with a ClassCastException. And getClass() can be used for that purpose, if objects do not belong to the same class they are clearly not equal.

Natural Id vs Surrogate Id

When you're talking about "NaturalId" like ISBN-number of a book versus "id", I guess you refer to a natural key of a persistence entity versus surrogate key which is used in a relational database.

There are different opinions on that point, the general recommended approach (see a link to the Hibernate user-guide and other references below) is to use natural id (a set of unique properties, also called business keys) in your application and ID which entity obtains after being persisted only in the database.

You can encounter hashCode() and equals() that are implemented based on surrogate id, and making a defensive null-check to guard against the case when an entity is in transient state and its id is null. According to such implementations, a transient entity would not be equal to the entity in persistent state, having the same properties (apart from non-null id). Personally, I don't think this approach is correct.

The following code-sample has been taken from the most recent official Hibernate 6.1 User-Guide

Example 142. Natural Id equals/hashCode

@Entity(name = "Book")
public static class Book {

    @Id
    @GeneratedValue
    private Long id;
    private String title;
    private String author;

    @NaturalId
    private String isbn;

    //Getters and setters are omitted for brevity

    @Override
    public boolean equals(Object o) {
        if (this == o) {
            return true;
        }
        if (o == null || getClass() != o.getClass()) {
            return false;
        }
        Book book = (Book) o;
        return Objects.equals(isbn, book.isbn);
    }

    @Override
    public int hashCode() {
        return Objects.hash(isbn);
    }
}

The code provided above that makes use of business-keys is denoted in the guide as a final approach in contrast to implementation based on the surrogate keys, which is called a naive implementation (see Example 139 and further).

The same reasoning for the choice ID vs Natural key has been described here:

You have to override the equals() and hashCode() methods if you

  • intend to put instances of persistent classes in a Set (the recommended way to represent many-valued associations) and

  • intend to use reattachment of detached instances

Hibernate guarantees equivalence of persistent identity (database row) and Java identity only inside a particular session scope. So as soon as we mix instances retrieved in different sessions, we must implement equals() and hashCode() if we wish to have meaningful semantics for Sets.

The most obvious way is to implement equals()/hashCode() by comparing the identifier value of both objects. If the value is the same, both must be the same database row, they are therefore equal (if both are added to a Set, we will only have one element in the Set). Unfortunately, we can't use that approach with generated identifiers! Hibernate will only assign identifier values to objects that are persistent, a newly created instance will not have any identifier value! Furthermore, if an instance is unsaved and currently in a Set, saving it will assign an identifier value to the object. If equals() and hashCode() are based on the identifier value, the hash code would change, breaking the contract of the Set. See the Hibernate website for a full discussion of this problem. Note that this is not a Hibernate issue, but normal Java semantics of object identity and equality.

We recommend implementing equals() and hashCode() using Business key equality.

For more information, have a look at this recent (Sep 15, 2021) article by @Vlad Mihalcea on how to improve caching query results with natural keys The best way to map a @NaturalId business key with JPA and Hibernate, and these questions:

Alexander Ivanchenko
  • 25,667
  • 5
  • 22
  • 46