0

I have a spring boot service that should persist several entities of two types in a transaction to an Oracle DB. The table for the first entity type is huge (3 Mio. entries/day, partitioned,...) and I have the issue that I need to react on duplicates. There are some fields I use to create a hash on and I have a unique constraint in the database on that field. I thought it is a clever idea to just saveAndFlush entity by entity and react on the ConstraintViolationException. Based on the result of saving the list of first entities, I need to create the second entity and save that as well, but it rolls back everything. My question now would be if this approach is generally wrong, or ok and there is some small issue? If it is generally wrong, how should I do this duplicate check then (a select upfront is not an option)?

Here is some pseudo-code to get a better idea

@Entity
public class Foo{

    public String uniqueHash;
    
    // couple of other properties that will be used to calculate the hash
}

@Entity
public class Bar{

    private List goodIds;
    private List badIds;
    
    public Bar(List goodIds, List badIds){
        this.goodIds = goodIds;
        this.badIds = badIds;
    }
}

@Repository
@Transactional(noRollbackFor = PersistenceException.class)
public interface FooRepository extends JpaRepository<Foo, String> {
  Foo saveAndFlush(Foo f) throws PersistenceException;
}

@Repository
@Transactional(noRollbackFor = PersistenceException.class)
public interface BarRepository extends JpaRepository<Bar, String> {
  Bar saveAndFlush(Bar b) throws PersistenceException;
}

SomeService

@Transactional(noRollbackFor = PersistenceException.class)
public void doSomething(List<Foo> foos){

    List<String> goodIds = new ArrayList();
    List<String> badIds = new ArrayList();
        
    for (Foo foo : foos) {
      try {     
        fooRepository.saveAndFlush(foo);
        goodIds.add(foo.getId());
      } catch (PersistenceException e) {
        if (e.getCause() instanceof ConstraintViolationException) {
          badIds.add(foo.getId);
        } else {
          throw e;
        }
      }
    }
    barRepository.saveAndFlush(new Bar(goodIds, badIds));
}
hecko84
  • 1,224
  • 1
  • 16
  • 29
  • You might want to look into error tables and see if that satisifes your needs. One issue with error tables is that all column values are conveted to VARCHAR2(4000) – Brian Leach Nov 10 '20 at 19:19
  • Why dont you check if that entity exists before inserting ? – ggr Nov 10 '20 at 20:43
  • What @ggr said, especially given that a unique index lookup should be pretty fast – crizzis Nov 10 '20 at 21:50
  • I have several instances of the services and fear that there will unfavorable conditions in that more than one service will receive the same message ending in having the duplicates as well. Furthermore, I don't want to double the DB interactions. The question is if my approach is not feasible at all, as I read some threads already indicating that and exception in a transaction will always roll back no matter what you do (although I thought that my scenario is a pretty controlled one) – hecko84 Nov 11 '20 at 07:22
  • @hecko84 Fear is not rational, did you test it ? Give t a try. If it fails because of duplicate, you can implements a retry mechanism. – ggr Nov 11 '20 at 19:11
  • 1
    @ggr I know that several services receive the same message and this is not in my hand (I mean the client is not in our control). My main concern is doubling the DB interactions for a rare scenario. I have a duplicate Foo probably each 10K instances, so I finally decided to change the way the data is processed and rather do a retry on the transaction like you also more or less proposed. I added the code as an answer to have it as a reference. – hecko84 Nov 12 '20 at 07:53

2 Answers2

1

Finally, I found a way to achieve the expected behavior, and even better, I was able to get rid of these "noRollBackFor" attributes. I only restructured the process and try to save everything in a transaction, if it fails, the Exception is caught on the calling method, the input is "cleaned" and the transactional method is called again (recursively). These duplicates are rare situations (happens every 10k Foo instance), so from a performance perspective, it's fine to have these subsequent transactions. Here is the changed pseudo-code again

@Entity
public class Foo{

    public String uniqueHash;
    
    // couple of other properties that will be used to calculate the hash
}

@Entity
public class Bar{

    private List goodIds;
    private List badIds;
    
    public Bar(List goodIds, List badIds){
        this.goodIds = goodIds;
        this.badIds = badIds;
    }
    
    public List getGoodIds(){
      return goodIds;
    }
    
    public List getBadIds(){
      return badIds;
    }
}

@Repository
public interface FooRepository extends JpaRepository<Foo, String> {
}

@Repository
public interface BarRepository extends JpaRepository<Bar, String> {
}

public class FooException extends RuntimeException {

  private final Foo foo;

  public FooException(String message, Foo foo) {
    super(message);
    this.foo = foo;
  }
  
  public getFoo(){
  return foo;
  }
}

SomeService

public void doSomething(List<Foo> foos, Bar bar){
        try{
        doSomethingTransactional(foos,bar);
        }
        catch (FooException e) {           
          bar.getBadIds().add(e.getFoo().getId());
          foos.remove(foo);
          doSomething(foos, bar);
        }
}


@Transactional
public void doSomethingTransactional(List<Foo> foos, Bar bar){
        
    for (Foo foo : foos) {
      try {     
        fooRepository.saveAndFlush(foo);
        bar.getGoodIds.add(foo.getId());
      } catch(DataAccessException e) {
        if (e.getCause() instanceof ConstraintViolationException 
        && ((ConstraintViolationException) e.getCause()).getConstraintName().contains("Some DB Message")) {
          throw new FooException("Foo already exists", foo);
        } else {
          throw e;
        }
      }
    }
    barRepository.saveAndFlush(bar);
}
hecko84
  • 1,224
  • 1
  • 16
  • 29
0

You might be able to use a custom @SQLInsert to make use of Oracles MERGE statement for this purpose. Also see https://stackoverflow.com/a/64764412/412446

Christian Beikov
  • 15,141
  • 2
  • 32
  • 58