32

I have several mapped objects in my JPA / Hibernate application. On the network I receive packets that represent updates to these objects, or may in fact represent new objects entirely.

I'd like to write a method like

<T> T getOrCreate(Class<T> klass, Object primaryKey)

that returns an object of the provided class if one exists in the database with pk primaryKey, and otherwise creates a new object of that class, persists it and returns it.

The very next thing I'll do with the object will be to update all its fields, within a transaction.

Is there an idiomatic way to do this in JPA, or is there a better way to solve my problem?

HenryR
  • 8,219
  • 7
  • 35
  • 39

6 Answers6

22

I'd like to write a method like <T> T getOrCreate(Class<T> klass, Object primaryKey)

This won't be easy.

A naive approach would be to do something like this (assuming the method is running inside a transaction):

public <T> T findOrCreate(Class<T> entityClass, Object primaryKey) {
    T entity = em.find(entityClass, primaryKey);
    if ( entity != null ) {
        return entity;
    } else {
        try {
            entity = entityClass.newInstance();
            /* use more reflection to set the pk (probably need a base entity) */
            return entity;
        } catch ( Exception e ) {
            throw new RuntimeException(e);
        }
    }
}

But in a concurrent environment, this code could fail due to some race condition:

T1: BEGIN TX;
T2: BEGIN TX;

T1: SELECT w/ id = 123; //returns null
T2: SELECT w/ id = 123; //returns null

T1: INSERT w/ id = 123;
T1: COMMIT; //row inserted

T2: INSERT w/ name = 123;
T2: COMMIT; //constraint violation

And if you are running multiple JVMs, synchronization won't help. And without acquiring a table lock (which is pretty horrible), I don't really see how you could solve this.

In such case, I wonder if it wouldn't be better to systematically insert first and handle a possible exception to perform a subsequent select (in a new transaction).

You should probably add some details regarding the mentioned constraints (multi-threading? distributed environment?).

Pascal Thivent
  • 562,542
  • 136
  • 1,062
  • 1,124
  • 1
    One answer I've had suggested to me for the multi-JVM case is to use cooperative advisory locks on the DB (e.g. http://www.postgresql.org/docs/9.1/static/explicit-locking.html). You lock the desired primary key, then do the check/update, then release the lock. – Matt R Mar 10 '14 at 10:49
  • That's interesting, @MattR, though I don't know if other database's provide a similar mechanism. It doesn't look like Oracle does. – DavidS Apr 12 '17 at 23:17
10

Using pure JPA one can solve this optimistically in a multi-threaded solution with nested entity managers (really we just need nested transactions but I don't think that is possible with pure JPA). Essentially one needs to create a micro-transaction that encapsulates the find-or-create operation. This performance won't be fantastic and isn't suitable for large batched creates but should be sufficient for most cases.

Prerequisites:

  • The entity must have a unique constraint violation that will fail if two instances are created
  • You have some kind of finder to find the entity (can find by primary key with EntityManager.find or by some query) we will refer to this as finder
  • You have some kind of factory method to create a new entity should the one you are looking for fail to exist, we will refer to this as factory.
  • I'm assuming that the given findOrCreate method would exist on some repository object and it is called in the context of an existing entity manager and an existing transaction.
  • If the transaction isolation level is serializable or snapshot this won't work. If the transaction is repeatable read then you must not have attempted to read the entity in the current transaction.
  • I'd recommend breaking the logic below into multiple methods for maintainability.

Code:

public <T> T findOrCreate(Supplier<T> finder, Supplier<T> factory) {
    EntityManager innerEntityManager = entityManagerFactory.createEntityManager();
    innerEntityManager.getTransaction().begin();
    try {
        //Try the naive find-or-create in our inner entity manager
        if(finder.get() == null) {
            T newInstance = factory.get();
            innerEntityManager.persist(newInstance);
        }
        innerEntityManager.getTransaction().commit();
    } catch (PersistenceException ex) {
        //This may be a unique constraint violation or it could be some
        //other issue.  We will attempt to determine which it is by trying
        //to find the entity.  Either way, our attempt failed and we
        //roll back the tx.
        innerEntityManager.getTransaction().rollback();
        T entity = finder.get();
        if(entity == null) {
            //Must have been some other issue
            throw ex;
        } else {
            //Either it was a unique constraint violation or we don't
            //care because someone else has succeeded
            return entity;
        }
    } catch (Throwable t) {
        innerEntityManager.getTransaction().rollback();
        throw t;
    } finally {
        innerEntityManager.close();
    }
    //If we didn't hit an exception then we successfully created it
    //in the inner transaction.  We now need to find the entity in
    //our outer transaction.
    return finder.get();
}
Pace
  • 41,875
  • 13
  • 113
  • 156
  • 2
    To reduce `finder` calls by 50%: `public static T findOrCreate( EntityManagerFactory emf, Supplier finder, Supplier factory) { EntityManager em=emf.createEntityManager(); T attempt1=finder.get(); if(attempt1 != null) return attempt1; T created=factory.get(); try { em.getTransaction().begin(); em.persist( created ); em.getTransaction().commit(); return finder.get(); } catch(Exception ex) { em.getTransaction().rollback(); T attempt2=finder.get(); if(attempt2!=null ) return attempt2; throw ex; } catch(Throwable t) { em.getTransaction().rollback(); throw t; } finally { em.close(); } }` – krevelen Sep 15 '16 at 13:55
  • @krevelen That is a good idea (it avoids more than just a read it avoids opening a transaction which can yield significant performance benefits) and will work if your isolation level is read committed or read uncommitted but will fail with repeatable read (default in many instances). Although I believe even my approach would fail if the isolation level is serializable. – Pace Sep 15 '16 at 15:37
3

I must point out there's some flaw in @gus an's answer. It could lead to an apparent problem in a concurrent situation. If there are 2 threads reading the count, they would both get 0 and then do the insertion. So duplicate rows created.

My suggestion here is to write your native query like the one below:

insert into af_label (content,previous_level_id,interval_begin,interval_end) 
    select "test",32,9,13
    from dual 
    where not exists (select * from af_label where previous_level_id=32 and interval_begin=9 and interval_end=13)

It's just like an optimistic lock in the program. But we make the db engine to decide and find the duplicates by your customized attributes.

E_net4
  • 27,810
  • 13
  • 101
  • 139
Eason Du
  • 71
  • 4
0

How about use orElse function after findByKeyword? You can return a new instance if no record is found.

        SearchCount searchCount = searchCountRepository.findByKeyword(keyword)
                .orElse(SearchCount.builder()
                        .keyword(keyword)
                        .count(0)
                        .build()) ;
gus an
  • 9
  • 1
  • This does not make sense. Where do findByKeyword and orElse come from? Also, how does this protect from concurrent inserts? – sleske Jun 14 '23 at 13:57
0

The other answers give good solutions to the problem. I'll try to summarise the various approaches, to give a good overview.

Support in JPA

I'd like to write a method [...] that returns an object of the provided class if one exists in the database with pk primaryKey, and otherwise creates a new object of that class, persists it and returns it.

The very next thing I'll do with the object will be to update all its fields, within a transactio

This is a reasonably common situation, and there is a name for this operation: Upsert (blend word of UPDATE and INSERT) - meaning insert a record if it is not present (decided by its key), update if it is present.

Most relational database systems have built-in support for this, either via the standard SQL keyword MERGE, or some other keyword - see the Wikipedia article for MERGE for details. This allows performing an upsert with a single SQL statement.

Is there an idiomatic way to do this in JPA, or is there a better way to solve my problem?

Unfortunately: no. JPA itself does not support an upsert operation. There is no UPSERT or MERGE or similar keyword in JPQL, or in the JPA API. More precisely: EntityManager.merge() will do what you want in a single-threaded solution (find &update the entity or insert it), but it is not thread-safe.

However, there are some workarounds (some explained in other answers).

Workarounds

Insert and catch constraint violation

Make sure that there is a unique index for the key field(s) that you want to use. Then just naively perform an insert, using EntityManager.persist(). If the record was not present, it will be inserted. If the record was already present, you will get an Exception, which you can catch. Then you can perform an UPDATE (using EntityManager.merge()) instead.

This is described in more detail in Pace's answer.

Advantage: No complicated native SQL required.

Disadvantage: The exception handling will be quite nasty, because JPA also does not have a portable way to distinguish wether an exception was caused by a constraint violation (which would be ok here) or by some other problem (dead database, network error) which you would still want to handle.

Use MERGE / UPSERT statement

You can use a native query to execute a MERGE or UPSERT statement, using the DB's built-in support for performing upserts.

Advantage: The cleanest solution from the DBMS point of view, as it uses the mechanism that the DBMS offers for this problem. Disadvantage: Somewhat nasty native SQL required; basically "going behind JPA's back".

See Using Merge statement for single table for details.

Use DB lock

You can also use a form of pessimistic locking, by acquiring a database lock on some record (using EntityManager.lock()). Which record to lock will be application-specific. Typically, if you have a 1:n relationship, and you are inserting into the :n table, you would lock the corresponding "main" record. For example, when adding a new item to an invoice, you would lock the main invoice record. After obtaining the lock, you check for existence of the record, and update or insert it.

If all code that performs inserts respects this, the locking will make sure that once you have acquired a lock, no other process/thread can interfere.

The acquisition of the lock and the update/insert must be put inside a transaction (the end of the transaction will automatically release the lock).

Advantages: No native SQL required. Can be faster than the other solutions in some cases (though this depends on specifics). Disadvantages: May reduce performance, because other code has to wait for the lock. In particular, the DBMS may decide to lock the whole table instead of just one row, which will make this worse. If you get the locking wrong, you could create potential deadlocks.

Recommendation

In my experience, using MERGE / UPSERT is usually the best solution, as it optimally uses the resources provided by the DBMS. The required native SQL is a bit ugly, and you must make sure not to accidentally persist via JPA, but apart from that it's the cleanest (and usually the fastest) solution.

If that is not practical, the "insert and catch" approach can also work - but it also has ugly code (the exception handling), and again, you must make sure to always apply this exception handling.

Using locks is helpful in some situations, but I'd use it as a last resort.

Remarks

  • There is actually a bug for EclipseLink (one JPA implementation) to provide support for upsert - Bug 344329 - Add support for UPSERT/MERGE. However, the bug is from 2011 and nothing seems to have happened. Also, this would ideally be added to the JPA specification, which does not seem to be happening either.
  • A simple solution to the problem is to make sure only a single thread/process is doing the inserting/updating - then you can just naively update and insert. But that is not always practical :-).
sleske
  • 81,358
  • 34
  • 189
  • 227
0

This is a method that I used to save element of list if they not exists.

someList.stream()
             .filter(e->!someRepository.existsById(e.getId()))
             .forEach(e->someRepository.save(e));