39

While reading through the Hibernate documentation, I keep seeing references to the concept of a natural identifier.

Does this just mean the id an entity has due to the nature of the data it holds?

E.g. A user's name + password + age + something are used as a compound identitifier?

Vlad Mihalcea
  • 142,745
  • 71
  • 566
  • 911
benstpierre
  • 32,833
  • 51
  • 177
  • 288

7 Answers7

43

In Hibernate, natural keys are often used for lookups. You will have an auto-generated surrogate id in most cases. But this id is rather useless for lookups, as you'll always query by fields like name, social security number or anything else from the real world.

When using Hibernate's caching features, this difference is very important: If the cache is indexed by your primary key (surrogate id), there won't be any performance gain on lookups. That's why you can define a set of fields that you are going to query the database with - the natural id. Hibernate can then index the data by your natural key and improve the lookup performance.

See this excellent blog post for a more detailed explanation or this RedHat page for an example Hibernate mapping file.

Emil Sierżęga
  • 1,785
  • 2
  • 31
  • 38
chris
  • 2,467
  • 2
  • 25
  • 25
26

In a relational database system, typically, you can have two types of simple identifiers:

  • Natural keys, which are assigned by external systems and guaranteed to be unique
  • Surrogate keys, like IDENTITY or SEQUENCE which are assigned by the database.

The reason why Surrogate Keys are so popular is that they are more compact (4 bytes or 8 bytes), compared to a Natural Key which is very long (e.g. the VIN takes 17 alphanumerical characters, the book ISBN is 13 digits long). If the Surrogate Key becomes the Primary Key, you can map it using the JPA @Id annotation.

Now, let's assume we have the following Post entity:

Post entity with natural id

Since the Post entity that has also a Natural Key, besides the Surrogate one, you can map it with the Hibernate-specific @NaturalId annotation:

@Entity(name = "Post")
@Table(name = "post")
public class Post {
 
    @Id
    @GeneratedValue
    private Long id;
 
    private String title;
 
    @NaturalId
    @Column(nullable = false, unique = true)
    private String slug;
 
    //Getters and setters omitted for brevity
 
    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) 
            return false;
        Post post = (Post) o;
        return Objects.equals(slug, post.slug);
    }
 
    @Override
    public int hashCode() {
        return Objects.hash(slug);
    }
}

Now, considering the entity above, the user might have bookmarked a Post article and now they want to read it. However, the bookmarked URL contains the slug Natural Identifier, not the Primary Key.

So, we can fetch it like this using Hibernate:

Post post = entityManager.unwrap(Session.class)
.bySimpleNaturalId(Post.class)
.load(slug); 

Hibernate 5.5 or newer

When fetching the entity by its natural key on Hibernate 5.5 or newer, the following SQL query is generated:

SELECT p.id AS id1_0_0_,
       p.slug AS slug2_0_0_,
       p.title AS title3_0_0_
FROM post p
WHERE p.slug = 'high-performance-java-persistence'

So, since Hibernate 5.5, the entity is fetched by its natural identifier directly from the database.

Hibernate 5.4 or older

When fetching the entity by its natural key on Hibernate 5.4 or older, two SQL queries are generated:

SELECT p.id AS id1_0_
FROM post p
WHERE p.slug = 'high-performance-java-persistence'
 
SELECT p.id AS id1_0_0_,
       p.slug AS slug2_0_0_,
       p.title AS title3_0_0_
FROM post p
WHERE p.id = 1

The first query is needed to resolve the entity identifier associated with the provided natural identifier.

The second query is optional if the entity is already loaded in the first or the second-level cache.

The reason for having the first query is because Hibernate already has a well-established logic for loading and associating entities by their identifier in the Persistence Context.

Now, if you want to skip the entity identifier query, you can easily annotate the entity using the @NaturalIdCache annotation:

@Entity(name = "Post")
@Table(name = "post")
@org.hibernate.annotations.Cache(
    usage = CacheConcurrencyStrategy.READ_WRITE
)
@NaturalIdCache
public class Post {
 
    @Id
    @GeneratedValue
    private Long id;
 
    private String title;
 
    @NaturalId
    @Column(nullable = false, unique = true)
    private String slug;
 
    //Getters and setters omitted for brevity
 
    @Override
    public boolean equals(Object o) {
        if (this == o) return true;
        if (o == null || getClass() != o.getClass()) 
            return false;
        Post post = (Post) o;
        return Objects.equals(slug, post.slug);
    }
 
    @Override
    public int hashCode() {
        return Objects.hash(slug);
    }
}

This way, you can fetch the Post entity without even hitting the database. Cool, right?

Vlad Mihalcea
  • 142,745
  • 71
  • 566
  • 911
  • Sorry for commenting on an old question. But, I didn't understand what does it mean by `skip the entity identifier query` ? – Amir Choubani Jan 06 '19 at 15:27
  • 1
    It means skipping this query: SELECT p.id AS id1_0_ FROM post p WHERE p.slug = 'high-performance-java-persistence' which aims to get the entity identifier for this parituvlar natural-id value. – Vlad Mihalcea Jan 06 '19 at 16:03
  • 1
    hi, is it necessary to use the "id" filed, can't we just have slug as Id and NaturalId both? – Khatri Mar 13 '19 at 12:03
  • Good example, thanks. **1.** Do we have to add `@NaturalIdCache` annotation whenever we use `@NaturalId` ? **2.** As far as I see, `@NaturalId` annotation is used an index, but I am not sure if I should use it whenever I have a unique field in my entity. Any idea? **3.** Can I use `@NaturalId` when both; the property is updatable or not? –  Aug 04 '22 at 06:20
10

A natural identifier is something that is used in the real world as an identifier. An example is a social security number, or a passport number.

It is usually a bad idea to use natural identifiers as keys in a persistence layer because a) they can be changed outside of your control, and b) they can end up not being unique due to a mistake elsewhere, and then your data model can't handle it so your application blows up.

Mark Byers
  • 811,555
  • 193
  • 1,581
  • 1,452
  • One hopes that the key is constrained, eg primary key constraint to reduce such risks – gbn Dec 15 '09 at 20:46
  • 2
    You can constrain it in your data model, but you can't constrain real-life - mistakes do happen, and your data model doesn't need to break when they do. If you need to correct someone's SSN because for example it was entered incorrectly, it should be a single UPDATE. If you've used it as a key throughout your system... serialized it, stored it in backups, and possibly even sent it to external systems, you're completely screwed. There's no way you are going to be able to update that person's SSN without breaking something. PS: don't store SSNs at all unless you have to. – Mark Byers Dec 15 '09 at 20:59
  • 1
    True, it still needs constrained and there should be a difference between logical model and implementation. SSN aint unique either... http://www.computerworld.com/s/article/300161/Not_So_Unique – gbn Dec 16 '09 at 05:47
8

What naturally identifies an entity. For example, my email address.

However, a long variable length string is not an ideal key, so you may want to define a surrogate id

AKA Natural key in relational design

gbn
  • 422,506
  • 82
  • 585
  • 676
2

A social security number might be a natural identity, or as you've said a hash of the User's information. The alternative is a surrogate key, for example a Guid/UID.

Chris S
  • 64,770
  • 52
  • 221
  • 239
  • The hash (and it doesn't need to be a hash since a key can be multi-column) would only be a valid natural key if the data cannot change (e-mail is fine, name is iffy, password is unlikely and age is wrong). – Tordek Dec 15 '09 at 20:42
  • @Chris S: Not opposite: "surrogate" – gbn Dec 15 '09 at 20:44
  • @Tordek: good point. @Gbn Updated the text a little. The wikipedia articles actually have good explanations of both – Chris S Dec 16 '09 at 11:55
2

In relational database theory a relation can have multiple candidate keys. A candidate key is a set of attributes of a relation that are never duplicate in two rows of that relation and that cannot be reduced by removing one of the attributes and still guarantee uniqueness.

A natural ID is essentially a candidate key. Where "natural" means it is in the nature of the data you hold in that relation, not something you add like an autogenerated key. A natural ID can be composed of a single attribute. In general any attribute of a relation that is unique and not-null is a candidate key, and can be considered a natural id.

In Hibernate this annotation can be used simply to denote that an attribute can be used to do searches that return unique results while not using the key. This can be useful when the attribute you denote as natural id is more natural to deal with for you, e.g. when the actual key is autogenerated and you don't want to use in searches.

user1708042
  • 1,740
  • 17
  • 20
1

Natural identifier (also known as business key): is an identifier that means or represent something in real life.
Email or national id for person
Isbn for Book
IBAN for Bank account

This @NaturalId Annotation is used to specify Natural identifier.

Ahmad Al-Kurdi
  • 2,248
  • 3
  • 23
  • 39