23

I have a very simple scenario involving a database and a JMS in an application server (Glassfish). The scenario is dead simple:

1. an EJB inserts a row in the database and sends a message.
2. when the message is delivered with an MDB, the row is read and updated. 

The problem is that sometimes the message is delivered before the insert has been committed in the database. This is actually understandable if we consider the 2 phase commit protocol:

1. prepare JMS
2. prepare database
3. commit JMS
4. ( tiny little gap where message can be delivered before insert has been committed)
5. commit database

I've discussed this problem with others, but the answer was always: "Strange, it should work out of the box".

My questions are then:

  • How could it work out-of-the box?
  • My scenario sounds fairly simple, why isn't there more people with similar troubles?
  • Am I doing something wrong? Is there a way to solve this issue correctly?

Here are a bit more details about my understanding of the problem:

This timing issue exist only if the participant are treated in this order. If the 2PC treats the participants in the reverse order (database first then message broker) that should be fine. The problem was randomly happening but completely reproducible.

I found no way to control the order of the participants in the distributed transactions in the JTA, JCA and JPA specifications neither in the Glassfish documentation. We could assume they will be enlisted in the distributed transaction according to the order when they are used, but with an ORM such as JPA, it's difficult to know when the data are flushed and when the database connection is really used. Any idea?

ewernli
  • 38,045
  • 5
  • 92
  • 123
  • questions: is MDB running on same server? if yes is MDB also using JPA to update the record? if yes are you using second level cache (I read you are using Hibernate in the Other post)? And finally if yes (using cache) can I know what implementation of cache you are using? – Elister Mar 11 '10 at 09:24
  • 1
    @Elister. Everything runs in the same server. We used JPA everywhere. Second level cache was disabled altogether. (The workaround we found was to use a native query `select * for update` to read the row in the MDB. Then it waits until 1st transaction is committed.) – ewernli Mar 11 '10 at 09:37
  • Could you please show some pseudo code? I'd like to know if you update the db in one EJB method and send the JMS message in another one (and maybe wrap the whole thing in a third method), if you use different EJBs, etc. – Pascal Thivent Mar 11 '10 at 16:17
  • @Pascal I had created a reproducing test case. Here it is the source code and the instructions: http://forums.java.net/jive/message.jspa?messageID=353154#391321 – ewernli Mar 11 '10 at 17:48
  • WebSphere 7 has added this support. Look at the "Commit priority for transactional resources" section http://publib.boulder.ibm.com/infocenter/wasinfo/fep/index.jsp?topic=/com.ibm.websphere.soafep.multiplatform.doc/info/ae/ae/cjta_trans.html – Aravind Yarram Jan 02 '11 at 20:18

1 Answers1

12

You are experiencing the classic XA 2-PC race condition. It does happen in production environments.

There are 3 things coming to my mind.

  1. Last agent optimization where JDBC is the non-XA resource.(Lose recovery semantics)
  2. Have JMS Time-To-Deliver. (Deliberately Lose real time)
  3. Build retries into JDBC code. (Least effect on functionality)

Weblogic has this LLR optimization avoids this problem and gives you all XA guarantees.

BenMorel
  • 34,448
  • 50
  • 182
  • 322
satks
  • 161
  • 1
  • 3
  • +1 thanks for the answer. So there is not way to implement that simply without relying on app. server advanced optimizations? (The spec don't mandate the app. server to have time-to-deliver neither last agent optimization.) – ewernli Mar 29 '10 at 17:43
  • Btw, I was effectively informed about last agent optimization on glassfish forum. I didn't know about the advanced LLR variant, though. Regarding retry logic, we've actually circumvented the problem with `select * for update`, much simpler. I'm happy to hear it's a "classic" issue. Still, what I don't get is that the spec themselves don't address this issues, e.g. mandate that we can specify a preferred order for the participants. – ewernli Mar 29 '10 at 17:45
  • 1
    @ewernli I'm facing the same issue right now. How did you solve it in the end? What do you mean with `select * for update` w.r.t. to retries? – Theo Jun 05 '11 at 12:16
  • 3
    @Theo `select * for update` will acquire a lock for the row that is read. The 1st tx that inserts the row has a lock for the row until the tx is committed. We pass the id of the row that was inserted (but not yet committed) in the message and if the MDB is actually fired before the insert is committed, using `select * where id='id of the row inserted' for update` will pause the 2nd tx until the row in the 1st tx is commited and the lock can be acquired. – ewernli Jun 06 '11 at 10:10
  • BTW, this also happens in MSDTC - we experienced it with distributed transactions involving MSMQ and MSSQL. The closest reference I could find (from the "MS world"), was [this](https://ayende.com/blog/167362/the-fallacy-of-distributed-transactions) article from Ayende Rahien. I could find no reports on the usual MS forums, nor any warnings about this in the MSDN docs. Needless to say, I could find no way to tell MSDTC to ensure a certain order of operations either. We solved this by retrying until we could see the change saved to the DB. – Paul Oct 09 '16 at 23:57