25

After several days passed to investigate about the issue, I decided to submit this question because there is no sense apparently in what is happening.

The Case

My computer is configured with a local Oracle Express database. I have a JAVA project with several JUnit Tests that extend a parent class (I know that it is not a "best practice") which opens an OJDBC Connection (using a static Hikari connection pool of 10 Connections) in the @Before method and rolled Back it in the @After.

public class BaseLocalRollbackableConnectorTest {
private static Logger logger = LoggerFactory.getLogger(BaseLocalRollbackableConnectorTest.class);
protected Connection connection;

@Before
public void setup() throws SQLException{
    logger.debug("Getting connection and setting autocommit to FALSE");
    connection = StaticConnectionPool.getPooledConnection();
}

@After
public void teardown() throws SQLException{ 
    logger.debug("Rollback connection");
    connection.rollback();
    logger.debug("Close connection");
    connection.close();
}

StacicConnectionPool

public class StaticConnectionPool {

private static HikariDataSource ds;

private static final Logger log = LoggerFactory.getLogger(StaticConnectionPool.class);

public static Connection getPooledConnection() throws SQLException {

    if (ds == null) {
        log.debug("Initializing ConnectionPool");
        HikariConfig config = new HikariConfig();
        config.setMaximumPoolSize(10);
        config.setDataSourceClassName("oracle.jdbc.pool.OracleDataSource");
        config.addDataSourceProperty("url", "jdbc:oracle:thin:@localhost:1521:XE");
        config.addDataSourceProperty("user", "MyUser");
        config.addDataSourceProperty("password", "MyPsw");
        config.setAutoCommit(false);
        ds = new HikariDataSource(config);

    }
    return ds.getConnection();

}

}

This project has hundreds tests (not in parallel) that use this connection (on localhost) to execute queries (insert/update and select) using Sql2o but transaction and clousure of connection is managed only externally (by the test above). The database is completely empty to have ACID tests.

So the expected result is to insert something into DB, makes the assertions and then rollback. in this way the second test will not find any data added by previous test in order to maintain the isolation level.

The Problem Running all tests together (sequentially), 90% of times they work properly. the 10% one or two tests, randomly, fail, because there is dirty data in the database (duplicated unique for example) by previous tests. looking the logs, rollbacks of previous tests were done properly. In fact, if I check the database, it is empty) If I execute this tests in a server with higher performance but the same JDK, same Oracle DB XE, this failure ratio is increased to 50%.

This is very strange and I have no idea because the connections are different between tests and the rollback is called each time. The JDBC Isolation level is READ COMMITTED so even if we used the same connection, this should not create any problem even using the same connection. So my question is: Why it happen? do you have any idea? Is the JDBC rollback synchronous as I know or there could be some cases where it can go forward even though it is not fully completed?

These are my main DB params: processes 100 sessions 172 transactions 189

Tonino
  • 1,137
  • 10
  • 25
  • 4
    Just a tangential comment, rolling back at the end of the test is usually a bad practice, because if a test fail, you won't see what went wrong in the DB (as it's in your case). A better approach is to prepare the DB with a known state before each test. The book [xUnit test patterns](https://www.amazon.com/xUnit-Test-Patterns-Refactoring-Code/dp/0131495054) has a few chapters on this. – Augusto Jul 07 '16 at 08:00
  • Are there any autonomous triggers created on tables ? If yes, then these triggers can leave some committed data in tables, because they are totally independent of the main transaction. – krokodilko Jul 08 '16 at 06:22
  • Do you perform DDL operations in between? Something like `create table as select from ...` or alike? One feature of oracle is to perform a commit as soon as DDL is performed. In that case the rollback at the end will not save you. – cheffe Jul 08 '16 at 11:52
  • @cheffe thanks. No DDL operations. In fact, if I check the database after the tests (in case of failure or success), it is clear. – Tonino Jul 08 '16 at 12:20
  • Not all things are rolled back in the database. Sequences for example, they remain increased. Maybe it has to do with the way how you create ids/keys in your application logic. It is hard to tell without the actual code. Post some code related to the violated constraints. – Dragan Bozanovic Jul 08 '16 at 19:56
  • Is there any chance you can use the Oracle Enterprise version? Since they provide the flashback feature that automatically rollback every change made to a database, to your restore point of time. – Hoàng Long Jul 11 '16 at 02:34
  • Aside of the problem I would suggest that you follow the advise of @Augusto we did follow the same pattern as you do for some time and switched. There is just fewer to no pain in turning the pattern and it just runs smoother. – cheffe Jul 11 '16 at 06:11
  • 1
    @HoàngLong the enterprise version is used in test/productive System. For local tests , they are using a local database and not the Test or Production ones so 20 computers and different build agents cannot be configured with a different (and expensive :)) version even though there are different limitations. – Tonino Jul 11 '16 at 09:03
  • @Tonino: ok. I thought this is for one integration test server. Running these kinds of "stress test" on development machine... well, good luck. – Hoàng Long Jul 11 '16 at 09:06
  • Have you confirmed the exception is thrown from the DB and not from an application layer cache? – Jeremy Jul 13 '16 at 05:04
  • @Jeremy yes the exception is given by the DB layer. There is no cache layer in this application – Tonino Jul 13 '16 at 06:18
  • Which version of the oracle driver are you using and which of Hikari? – cheffe Jul 14 '16 at 05:58
  • @cheffe ODJBC 11.2.0.4 and Hikari-2.3.13. However I found the reason (see my answer below) – Tonino Jul 14 '16 at 06:26
  • 1
    Since nobody has provided (nor *could* provide) the answer, I'll let SO handle the bounty and won't assign it actively. If anyone has anything in contrary please let me know. – bigstones Jul 14 '16 at 13:11

7 Answers7

4

I have run into the same problem 2-3 years ago (I have spent a lot of time to get this straight). The problem is that the @Before and @After is not always really sequential. [You could try this by starting the process in debug and place some breakpoints in the annotated methods.

Edit: I was not clear enough as Tonio pointed out. The order of @Before and @After is guarantied in terms of running before the test and afterwards it. The problem was in my case that sometimes the @Before and the @After was messed up.

Expected:

@Before -> test1() -> @After -> @Before -> @test2() -> @After

But sometimes I experienced the following order:

@Before -> test1() -> @Before -> @After -> @test2() -> @After

I am not sure thet it is a bug or not. At the time I dug into the depth of it and it seemed like some kind of (processor?) scheduling related magic. The solution to that problem was in our case to run the tests on a single thread and call manually the init and cleanup processes... Something like this:

public class BaseLocalRollbackableConnectorTest {
    private static Logger logger = LoggerFactory.getLogger(BaseLocalRollbackableConnectorTest.class);
    protected Connection connection;

    public void setup() throws SQLException{
        logger.debug("Getting connection and setting autocommit to FALSE");
        connection = StaticConnectionPool.getPooledConnection();
    }

    public void teardown() throws SQLException{ 
        logger.debug("Rollback connection");
        connection.rollback();
        logger.debug("Close connection");
        connection.close();
    }

    @Test
    public void test() throws Exception{
        try{
            setup();
            //test
        }catch(Exception e){ //making sure that the teardown will run even if the test is failing 
            teardown();
            throw e;
        }
        teardown();
    }
}

I have not tested it but a much more elegant solution could be to syncronize the @Before and @After methods on the same object. Please update me if You have the chanse to give it a try. :)

I hope it will solve your problem too.

Community
  • 1
  • 1
Hash
  • 4,647
  • 5
  • 21
  • 39
  • Hi Mark, This was the first check that I did for sure. Wrote some logs in Before and After with instance id in order to understand if there was any different order. However this order is always guaranted (http://stackoverflow.com/a/6076684/2088039) and so far never had this kind of problem with the order. – Tonino Jul 13 '16 at 06:22
  • It is about a bit of different thing. Give it a try, if I am correct the test will not fail (of course you should make sure that teardown() will run even if something goes wrong). The @ After and @ Before is garantied to run before each test, but it is not garantied that @ Before will run after the previous @ After! I'll correct the reply if I described pit foggy. – Hash Jul 13 '16 at 07:43
  • Ok got it even though it was not registered neither this behavior. Just a note on your comment, the teardown has to be called in the Finally and not only in the catch. @Test public void test() throws Exception{ try{ setup(); //test }catch(Exception e){ //making sure that the teardown will run even if the test is failing throw e; } finally {teardown();} } – Tonino Jul 13 '16 at 10:07
  • Could you please give it a try? I am really curious. :) The finally block is not a good solution in this case. There is no fixed time that it will run the terdown(), but you are right the example is missing a point I update it. – Hash Jul 13 '16 at 12:22
  • Hi Mark. Tried but without success. Probably I found the root cause of all problems...I will keep you posted – Tonino Jul 13 '16 at 16:09
1

If your problem just needs to be "solved" (e.g. not "best practice") regardless of performance to just make the tests complete in order, try to set:

config.setMaximumPoolSize(1);

You might need to set a timeout higher since the tests in the test queue will wait for its turn and might timeout. I usually don't suggest solutions like this but your setup is suboptimal, it will lead to race conditions and data loss. However, good luck with the tests.

Gillsoft AB
  • 4,185
  • 2
  • 19
  • 35
  • Also using this configuration, the issue occured. We used also a single not pooled connection but without any success... Now it is not a matter of best practice for tests but just to understand the "magic" :) Thank you – Tonino Jul 08 '16 at 08:24
1

Try configure audit on all statements in Oracle. Then find sessions which live simultaneously. I think that there is the problem in tests. JDBC rollback is synchronous. Commit can be configured as commit nowait but I don't think you do it special in your tests.

Also pay attention on parallel dml. On one table in the same transaction you can't do parallel dml + any other dml without commit because you get Ora-12838.

Do you have autonoumous transaction? Business logic in tests can manually rollback them and during tests autonoumous transaction is like another session and it doesn't see any commits from parent session.

Evgeniy K.
  • 1,137
  • 7
  • 11
  • 1
    Read better the question. He says that there is ALWAYS a single connection and transaction. – the_dark_destructor Jul 11 '16 at 09:46
  • There may be jobs and autonoumous transaction works in another session. In all this situation there is more than 1 session. – Evgeniy K. Jul 11 '16 at 09:51
  • 1
    @the_dark_destructor Also test can manually run smt else. Don't think that I am mock on you but many people says that there is not any parallel session but audit give another result. – Evgeniy K. Jul 11 '16 at 09:54
1

Not sure if this will fix it, but you could try:

public class BaseLocalRollbackableConnectorTest {
  private static Logger logger = LoggerFactory.getLogger(BaseLocalRollbackableConnectorTest.class);
  protected Connection connection;
  private Savepoint savepoint;

  @Before
  public void setup() throws SQLException{
    logger.debug("Getting connection and setting autocommit to FALSE");
    connection = StaticConnectionPool.getPooledConnection();
    savepoint = connection.setSavepoint();
  }

  @After
  public void teardown() throws SQLException{ 
    logger.debug("Rollback connection");
    connection.rollback(savepoint);
    logger.debug("Close connection");
    connection.close();
    while (!connection.isClosed()) {
      try { Thread.sleep(500); } catch (InterruptedException ie) {}
    }
}

Really there are two 'fixes' there - loop after the close to be sure the connection IS closed before returning to the pool. Second, create a savepoint before the test and restore it afterwards.

JoeG
  • 7,191
  • 10
  • 60
  • 105
  • thanks but No success...about the wait, setting a Thread.sleep of 8/12 seconds, the issue is reduced and sometime seems that it will not happen...but it cannot be a solution because all tests will take more than 30 minutes to be executed doing this kind of sleep :) – Tonino Jul 12 '16 at 06:45
  • I think this is telling us something. Increasing the sleep value there should not really affect the execution time of the program - IF it was really waiting for the connection to be closed! That loop should wait in 1/2 sec increments up to the 8-12 seconds if needed. Your increasing it is affecting timing elsewhere and making it work.... – JoeG Jul 12 '16 at 11:20
  • sorry I was not clear. The connection is always closed and in the cycle it doesn't enter never. After the .close(), the connection is always closed (I tried throwing an exception in case of connection was not closed after then). the sleep that I mentioned was put after the .close() method and it is executed always, even though the connection is closed properly. – Tonino Jul 13 '16 at 06:24
1

Like all other answers have pointed out, it's hard to say what goes wrong with the provided information. Further more, even if you manage to find the current issue by audit, it doesn't mean that your tests are free from data errors.

But here's an alternative: because you already have a blank database schema, you can export it to a SQL file. Then before each test:

  1. Drop the schema
  2. Re-create the schema again
  3. Feed the sample data (if needed)

It would save lots of time debugging, make sure the database in its pristine state every time you run the tests. All of this can be done in a script.

Note: Oracle Enterprise has the flashback function to support your kind of operation. Also, if you can manage to use Hibernate and the likes, there's other in-memory databases (like HSQLDB) that you can utilize to both increase testing speed and maintain coherence in your data set.

EDIT: It seems implausible, but just in case: connection.rollback() only takes effect if you don't call commit() before it.

Hoàng Long
  • 10,746
  • 20
  • 75
  • 124
  • 1
    Thanks! I think that posting the code of 200 tests was not relevant because performs hundreds inserts and selects and I already checked any kind of commit between them. Additionally there is no particular test that fails but is very random... however flashback for Oracle XE is not available. The proposal to use a delete procedure to clean the database could be a best one even though the database is versioned and cannot be modified in its structure or delete anagraphic info. – Tonino Jul 11 '16 at 05:44
  • so it is random? Interesting... if you cannot change the schema, at least we can truncate all the tables, remove indexes, sequences, packages, triggers and store procedures, then re-init again? – Hoàng Long Jul 11 '16 at 06:00
  • Yes it is possible but the reason of this post and my headache is to understand: why it happens? This is very strange and there is no reason apparently. My little knowledge on database and transactions are falling down with that... – Tonino Jul 11 '16 at 06:41
  • To trace the root issue, your best bet would be using Audit function (check the link in @Evgeniy's answer and my answer). Turn audit on and check the log of all queries performed against the database. For possible reasons, Dragan already said it: not everything is rolled back for a failed transaction (eg: sequences...). It also can be really silly thing like someone already call to commit data somewhere in the code. – Hoàng Long Jul 11 '16 at 08:47
  • thanks for the edit but I am pretty sure that commit is never called in the transaction. again: it is random. most of times it works perfectly if I run them sequentially. running all tests 100 times (never in parallel), 10 times, some of them, failed due to duplicated keys or old data – Tonino Jul 11 '16 at 09:07
  • oh, so it's a duplicate key issue. How did you generate the key then? By trigger increasing on a sequence? Or the application compose the key itself? – Hoàng Long Jul 11 '16 at 09:13
  • I agree with @Hoàng Long in his assumption about sequence. User Dragan Bozanovic 2 days ago said that the main problem may be in generating sequence . Can you post code which genereate key?(procedure or sequence) – Evgeniy K. Jul 11 '16 at 09:24
  • The pseudocode of what I have (The same for all hundreds tables I have) is: the table1 has ID, COL1,COL2 INSERT INTO TABLE1 (COL1,COL2) VALUES (VAL1,VAL2) In this table is configured a trigger as below BEGIN if :new.ID IS NULL THEN SELECT my_sq.nextval into :new.ID from dual; end if; END my_trigger; – Tonino Jul 11 '16 at 10:59
  • @Tonino: if trigger uses sequences, then the ID duplication should not occur. Oracle sequences ensure uniqueness. "running all tests 100 times (never in parallel), 10 times, some of them, failed due to duplicated keys or old data" ==> are these completely random? Do they concentrate on some functions/tables? If things are exactly like you describe, it's quite strange. – Hoàng Long Jul 12 '16 at 01:41
  • @HoàngLong for this reason I didn't post the code because there are all flat query (no function,procedure,DDL, or strange trigger) and IDs are created by a sequence. – Tonino Jul 12 '16 at 06:39
  • the current issue now is that it's difficult to reproduce your case. If you can narrow the scope a bit (eg: make a smaller test programs which achieve similar situations), it would be easier to investigate. Right now it's like shooting in the dark. – Hoàng Long Jul 13 '16 at 06:16
  • May you get the same error if you only insert/rollback on 1 table multiple times? That means, just run ONE test case multiple times, to see if the error occurs We can narrow down the issue by that way. – Hoàng Long Jul 13 '16 at 06:18
1

After all confirmation from your answers that I am not mad with Rollbacks and transactions behavior in unit tests, i deeply checked all queries and all possible causes and fortunately (yes furtunately...even if I'm ashamed for that, I make my mind free) all works as expected (Transactions, Before, After, etc).

There are some queries that get the result of some complex views (and radically deep configured into the DAO layer) to identify the single row information. This view is based on the MAX of a TIMESTAMP in order to identify latest of a particular event (in the real life the events coming after several months).

Doing the preparation of the database to proceed with the unit tests, these events are added sequentially by each test. In some cases, when these insert queries under the same transaction are particular fast, more events related to the same object are added in the same Millisecond (The TIMESTAMP is added manually using a JODA DateTime) and the MAX of a date, returns two or more values. For this reason it is explained the fact that on more performant computers/servers, this happened more frequently than the slower ones. This view is used in more tests and depending by the test, the error is different and random (NULL value added as Primary Key, duplicated primary Key, etc) .

For Example: in the following INSERT SELECT query is evident this bug:

INSERT INTO TABLE1 (ID,COL1,COL2,COL3) 
  SELECT :myId, T.VAL1, T.VAL2, T.VAL3 
  FROM MyView v 
  JOIN Table2 t on t.ID = v.ID
  WHERE ........

the parameter myId is added afterwards as Sql2o Parameter

MyView is

SELECT ID, MAX(MDATE) FROM TABLEV WHERE.... GROUP BY ...

When the view returns at least 2 results due to the same Max Date, it fails because the ID is fixed (generated by a sequence at beginning but stored using the parameter in a second time). This generates the PK constraint violated.

This is only one case but make me (and my colleagues) crazy due to this randomly behaviours...

Adding a sleep of 1 millisecond between those events insert, it is fixed. now we are working to find a different solution even though this case (an user that interact two times in the same millisecond) cannot happen in production system but the important things is that no magic happens as usual!

Now you can insult me :)

Tonino
  • 1,137
  • 10
  • 25
  • 1
    I knew it has to be something silly :-) Joking aside, congrats on your findings. Have a similar case with duplicate time-stamp before, it is crazily hard to find. Sometimes I want to blame our computer because they are so fast – Hoàng Long Jul 14 '16 at 01:23
  • just a question, couldn't you find this out if you turn on Audit like Evgeniy recommend? – Hoàng Long Jul 14 '16 at 01:31
  • 1
    As I understand it's those rare case when parallel sessions are not the issue. I assume if you, don't use JODA DateTime and use Oracle timestamp then you don't enconter this problem. – Evgeniy K. Jul 14 '16 at 02:50
  • 1
    @Hoàng Long In audit log all data links with timestamp('aud$.ntimestamp' column) and Oracle views on audit data cast timestamp to data trimming multiseconds part! You can find statement whose has the same date execution, because Oracle trimming multiseconds part, but looking on the same query on aud$ those statements have different timestamp. To you question, audit may be helpful only if you use Oracle views not internal. I know some cases when DB Administrator use theirs views with timestamp columns and in such cases you can't simply find that statements. – Evgeniy K. Jul 14 '16 at 02:58
  • @FedericoPiazza vero :) Anyway a good exchange of experience between users – Tonino Jul 14 '16 at 06:23
  • @EvgeniyK. Yes it is the best way to do that even though probably the entire view mechanism has to be reviewed totally to be more robust – Tonino Jul 14 '16 at 06:23
-1

You can do one thing increase the no. of connections in max pool size and rollback the operation in the same place where you committed the operation instead of using it in @after statement. Hope it will work.