I'm working on a microservice that serves REST endpoints for saving/retrieving data to/from a database, using spring data.
Lets call the entity class Foo
which has a simple Long
for its ID field some other data fields. IDs for each Foo
are not auto-generated in this service, they are supplied from an external source which knows to make them unique.
The service has one POST endpoint that serves both the create and update functions of the CRUD model, which calls a corresponding function in the service layer of the code, let's call this function AddData(Foo foo_incoming)
. The body of the POST message contains data to save to the database and the ID of the Foo
to save the data to. The logic of AddData
looks like this:
@Service("fooService")
public class FooServiceImpl {
@Autowired
FooRepository fooRepository; // Subinterface of JpaRepository
@Transactional
public Long AddData(Foo foo_incoming) {
Optional<Foo> foo_check = fooRepository.findById(incoming.getId());
Foo foo_exists;
// Exists already?
if (foo_check.isEmpty()) {
// New Foo
foo_exists = fooRepository.saveAndFlush(foo_incoming);
} else {
// Update existing foo
foo_exists = foo_check.get();
foo_exists.addToFieldA(foo_incoming.getFieldA());
foo_exists.addToFieldB(foo_incoming.getFieldB());
}
return foo_exists.getId();
}
}
This one function is in charge of both creating the initial record for a Foo
and updating the record.
When POST requests come in to add data to some Foo
with ID=1, let's call it foo-1
, which doesn't exist yet, if they come in with a reasonable amount of time between them, the first request will create the initial record for foo-1
, and all subsequent calls will only update. I.e. enough time passes for saveAndFlush
to actually flush to the database, so subsequent calls to findById
find foo-1
in the database, and jump to the else
block and just updates its fields.
The problem I'm running into is, when N POSTs for the same Foo
(same ID) are sent to the service fast enough, it seems that all the corresponding calls to AddData
happen concurrently. So, if foo-1
doesn't exist yet, in each of those calls to AddData
, findById(1)
returns empty. So saveAndFlush
gets called N times for Foo
s with ID=1, which raises a DataIntegrityViolationException
.
I've been digging around the web for days trying to solve this.
- The method is already annotated
@Transactional
. I've tried using@Transactional(isolation = Isolation.SERIALIZABLE)
on just the method and on the entire class, doesn't help. - I've tried annotating the
findById
andsaveAndFlush
methods inFooRepository
with@Lock(LockModeType.PESSIMISTIC_READ)
and@Lock(LockModeType.PESSIMISTIC_WRITE)
, respectively, no luck. - I've tried adding a
@Version
field toFoo
, no change.
I can't figure out how to force AddData
to happen serially, I thought that's what @Transactional(isolation = Isolation.SERIALIZABLE)
was supposed to do.
I'm considering giving "create" and "update" their own functions - making a PUT endpoint for create. But then the PUT endpoint would have a similar issue - if I wanted to try to prevent primary key collisions in code, I'd have to do a similar check with findById
before performing saveAndFlush
. But the way this service is actually used, the PUT endpoint may not be an option.
Wrapping the saveAndFlush
in a try/catch block does catch the exception, to my surprise. I could try some funky logic to try calling findById
again when saveAndFlush
fails, but if there's a way to avoid the exception being thrown, I'd prefer that.
Any suggestions would be appreciated!
EDIT: Some more context that may be useful. This microservice runs in a Kubernetes cluster where there can potentially be many instances of this service serving requests concurrently. I'm still researching handling concurrency of multiple instances, and figuring that out isn't something I have to do on my own - my team is developing several microservices like this, we may develop a common library to address such issues for all of them.
EDIT 2: I forgot that as of now, I'm using the H2 database while running the service, not a real database. Might that have something to do with this?
And I'll reiterate, what's happening here is multiple calls to check the database for foo-1 are being made before foo-1 exists yet; because of that, I don't think database locking is going to help me here, because there's no entity to lock on. I thought forcing AddData
to happen serially would solve this problem, and I'm completely stumped as to why adding @Transactional(isolation = Isolation.SERIALIZABLE)
to AddData
isn't doing that for me.