How to simulate database failure to test 2-phase commit in Java

Question

I am implementing a 2-phase commit involving distributed resources. How do I simulate the failure of a participating database ? Pulling out the network cable doesn't work as it causes table deadlock. I am currently using hooks in my application code which throw StaleConnectionException at different points like before query execution, after query execution. My concern with this approach is:

Is there a better way to simulate the DB failure?
What happens to the connection object when DB connection goes bad? Does it retain its value or does it become null?
What actually happens when application tries to reconnect to DB?What value does connection object get?Does it use an existing value from the connection pool?

I would also like to test at intermediate points like during query execution, during commit (after prepare is sent, etc). Right now I put application into debug mode and step into the function call and pull the plug in between. But this approach is manual and won't work for a scale testing.

Is there a simulator/emulator or tool which can help me do this?

Are you targeting any particular database, or does this need to be a generalized solution for any JDBC-connected database? — kgrittn, Apr 11 '12 at 15:27
Andy, which method to simulate database failure did you choose? — Dmitry D, Sep 11 '13 at 14:26
@dmiandre: this was sometime back so dnt really recollect..but I think one method I did was to ensure that the query on the 2nd DB came up with an error (Incorrect table name or something)..Thus the 2nd query did fail while the 1st reaches the commit phase. My primary goal was to fail the 2-phase commit so this worked! Will try to dig up the old project and see if I can find any other methods used. — Andy, Sep 11 '13 at 15:49
I've been searching the internet to find best practices to test such cases but it seems than no one tests it. The only thing I found is Byteman from JBoss. It's a tool for injecting failures without changing your code. — Dmitry D, Sep 11 '13 at 16:24

score 5 · Answer 1 · edited Aug 10 '17 at 16:29

That's a lot of questions :) I will try to complete the previous answers.

Is there a better way to simulate the DB failure?

Testing all cases is complicated. One way to test the main cases would be to create a JCA connector (a DB driver is is a JCA connector). You can obtain connections from the connector that will be enlisted in the transaction (a third participant). The connection can then raise certain errors.

There are three parts that work together: (1) the application, (2) the app. server's transaction manager, and (3) the jca connector (so-called resource adapter).

Communications between the three parts

The connection hooks itself into the transaction via ManagedConnection.getXAResource. With a custom jca connector you can then raise exception to the application (Connection in the picutre) or the application server's transaction manager (XAResource obtained via the ManagedConnection in picture). You can notably throw exception during XAResource.prepare and XAResource.commit, that corresponds to errors during the 2 phase commit.

Note that it is hard to control the order of enlisment of the participants (see this question). So it easy to test that one of the prepare fails (namely yours), but it's hard to control the order in which they are called. Reproducing all possible invalid states of 2 phase commit is complicated, especially when taking optimization into play.

(I wrote once a JCA connector (http://code.google.com/p/txfs) and there are others around, if you want sample code.)

What happens to the connection object when DB connection goes bad? 
Does it retain its value or does it become null?

The ManagedConnection can notify to the transaction manager. One of the notification is ConnectionEvent.CONNECTION_ERROR_OCCURRED that informs it that an error occurred when using this particular connection.

As noted in other answer, there is normally one managed connection associated per transaction. The managed connection abstracts the physical connection, and you don't want to use too many. The application obtains only "handles" (Connection in the picture). The handles obtain within one given transaction all point to the same managed connection. This is an optimization that most app servers support.

If the managed connection become invalid, the handles that use it become invalid as well. But the handles can AFAIK not be "refreshed". The transaction must rollback, the managed connection is destroyed. When another transaction starts it will be associated to another valid managed connection from the pool.

What actually happens when application tries to reconnect to DB?
What value does connection object get?
Does it use an existing value from the connection pool?

The app server manages a pool of managed connection. As said in previous paragraph, one might go bad while it is used. But one can also go bad without being used. For instance, an used managed connection in the pool might become invalid because the underlying physical connection timed out. App servers have usually a feature to test whether a managed connection is valid, before it starts using it. If not, it will try another managed connection from the pool, or create a new one.

Thanks for such a detailed explanation. I havnt got the time to test this yet. I will vote up once i get to test it, bt still thnx 4 such a detailed explanation — Andy, May 07 '12 at 19:39
Hope it helps. What you want to do is not easy. (And as @nsfyn55 wrote, there is a "extremely high cost/ low reward") — ewernli, May 08 '12 at 06:21

score 1 · Answer 2 · answered Apr 11 '12 at 15:19

1

Probably you can add your own resource that will participate in the commit and will pause the transaction after the first phase. In the meantime you can "pull the plug".

answered Apr 11 '12 at 15:19

Andrej

1,679
1
26
40

I cant add any more resources than i already have available. Also I dnt hav control over their running, i cant stop and start them as they are dev DBs. – Andy Apr 11 '12 at 16:14
I think what Andrej meant is to enlist another (dummy) XAResource that would trigger some kind of failure between prepare and commit. The proper way would be to create a resource adapter. You could also try to enlist the XAResource directly from within your application, but I think that WebSphere doesn't allow that through the standard JTA APIs (note that normally that is not allowed by J2EE anyway). You would need to use a WebSphere specific API (because WebSphere requires you to generate a so-called "recovery token" when enlisting an XAResource). – Andreas Veithen Apr 11 '12 at 18:47

score 1 · Answer 3 · answered Apr 11 '12 at 19:10

Andrej answered one part of the question, so let me answer the second part.

The Connection object you get in your application is only a wrapper around the physical connection. That wrapper plays a role in connection pooling and transaction management. If anything goes wrong with the DB, the connection wrapper becomes unusable and you can only rollback. That makes sense because you access the connection only before the 2PC starts, and anything done before the start of the 2PC cannot be recovered.

Note that attempting to release the connection and acquire a new one doesn't change anything because once a connection from a given data source has been used in a transaction, you will always get the same connection from that data source as long as you are in the same transaction. This means that your application can't "reconnect" without restarting the entire transaction.

On the other hand, if something goes wrong after all resources have been prepared but before all resources have been committed, then it is the responsibility of the transaction manager to recover the transaction. But this happens behind the scene and your application has no control. Also at this point, your application is expected to have released all connections used in that transaction.

Thnx for explaining in detail. What is happening in my case is: I execute a query on both DB as part of the transaction, and just when it is about to commit, i pull the plug. The application throws a "javax.transaction.HeuristicMixedException" and then attempts to rollback. Now when the TM (Websphere in my case) tries to perform rollback, it gets the following exeception: " XAException occurred. Error code is: XAER_NOTA (-4) ERRORCODE=-4228, SQLSTATE=null". Then my qn is: If commit was never called, prepare was never sent.Should the TM still call rollback? — Andy, Apr 11 '12 at 22:43
Also, is my approach of jst throwing a "StaleConnectionException" sufficient to simulate a DB failure, or should I also release the connections? — Andy, Apr 11 '12 at 22:46

nsfyn55 · Answer 4 · 2012-04-11T19:45:07.213

Your best bet is probably to use in memory databases. Invoke the failure and check the state of the data sources before and after to ensure the rollback/commit executed properly.

As for your other concerns these seem like extremely high cost/ low reward tests. Read your vendors documentation and ensure that your transaction environment is configured appropriately. One this is done you probably should automate it so its hands off.

Unless you wrote your own 2PC protocol specific transaction manager + DB implementation I would leave the testing of those features your vendor.

How to simulate database failure to test 2-phase commit in Java

4 Answers4