3

I have a gradle test task which runs a list of tests from a given file. Sometimes, any particular test execution simply gets stuck and does not move on to execute the next test(s) in the list.

For this, I am trying to add a java agent which will detect timeouts in each test execution and calls System.exit() in this case. (I know calling System.exit() seems to be a rash decision, but throwing exception does not seem to stop the test execution) The java agent uses byte-buddy advices for doing this.

public class TimerAdvice {
    public static CountDownLatch latch;

    @Advice.OnMethodEnter
    static long enter(@Advice.This Object thisObject,
                      @Advice.Origin String origin,
                      @Advice.Origin("#t #m") String detaildOrigin) throws InterruptedException {
        System.out.println("Timer Advice Enter thread: " + Thread.currentThread().getName() + " time: " + Instant.now().toString());

        latch = new CountDownLatch(1);

        ThreadFactory factory = new MyThreadFactory(new MyExceptionHandler());
        ExecutorService threadPool = Executors.newFixedThreadPool(1, factory);
        threadPool.execute(new TestCallable());

        return System.nanoTime();
    }

    @Advice.OnMethodExit (onThrowable = Throwable.class)
    static void onExit(@Advice.Origin Method method) throws ClassNotFoundException, IllegalAccessException, InstantiationException {
        System.out.println("Timer Advice Exit thread: " + Thread.currentThread().getName() + " time: " + Instant.now().toString());
        System.out.println("Counting down");
        latch.countDown();
    }
}

Basically this will spawn a background thread that will wait until the latch is counted down.

public class TestCallable implements Runnable {
    @Override
    public void run()  {
        try {
            latch.await(10, TimeUnit.MINUTES);
        } catch (InterruptedException e) {
            e.printStackTrace();
            throw new IllegalStateException(e.getMessage());
        }
        if(latch.getCount() > 0) {
            System.err.println("Callable thread"
                    + Thread.currentThread().getName() +
                    "TIMEOUT OCCURED!!!!");
            System.exit(1);
        }
    }
}

The latch countDown() method will be called by the method that handles the OnExit advice. Until then, the thread will wait for the specified timeout.

My question is, Why is the System.exit() call not affecting the test execution/jvm When this thread calls the System.exit(), the test thread still continues to execute as if nothing had happened. I would like to stop the execution of the test at this point.

Any suggestions on how should I stop the entire test execution process when the timeout is detected?

Abhijith Gururaj
  • 437
  • 4
  • 14
  • Wouldn't it be simpler to run the offending tests in a separate process and use Process.destroy to kill it if it takes too long? Wouldn't it be better to fix the problem that causes the tests to get stuck? – Stephen C May 12 '21 at 12:12
  • It is not that simple. We schedule a list 50-100 tests for each task. A timeout in one test should not affect the execution of the remaining tests in the list. A java agent attached to each test execution is the only way we could think of, so that an exception/exit is done for that particular test only. The remaining tests would then continue to execute. – Abhijith Gururaj May 12 '21 at 12:28
  • As to fixing the problem that causes these tests to get stuck, there are more than 17k tests. It would be better to have a hard stop mechanism such as this that would prevent any test to timeout. – Abhijith Gururaj May 12 '21 at 12:42
  • Sorry, but doesn't each test framework habe its own timeout mechanism, usually applied via annotation, but also via global configuration or programmatically? It feels like you are re-inventing the wheel here. You did not explain which more canonical options you explored. And as for `System.exit()`, it does not seem to make any sense, because usually you execute all tests in one module or at least big groups of them in a single VM. With 17k tests you are not going to start 17k JVMs, are you? – kriegaex May 12 '21 at 15:16
  • @kriegaex Hahaha no. The 17k tests are divided into batches of 50-100 tests. These batches are queue to a set of nodes. Each node is responsible for executing its assigned "batch". Now this is where I'd like to introduce a timeout for each test. It may be Junit3/Junit4. The end goal is, a timeout of one test in a batch should not stop execution of the remaining set of tests(in the same batch) in the node. – Abhijith Gururaj May 12 '21 at 16:47
  • @kriegaex There is no efficient way to set a global timeout for all Junit(both 3 and 4) tests. There is a gradle timeout plugin by tableau but that only works for JUnit4. – Abhijith Gururaj May 12 '21 at 17:00
  • JUnit 3 - ugh! I normally write my tests using Spock, but also know some JUnit 4/5. Normally I use Maven, never Gradle. But I can look into your issue if you publish an [MCVE](https://stackoverflow.com/help/mcve) reproducing the problem on GitHub. Just create a tiny, but complete sample project with 2-3 dummy JUnit 3 + 4 tests and with your Byte Buddy setup. BTW, would AspectJ instead of BB be an alternative for you? I know AspectJ way better than BB and would recommend to use it for this purpose. But a BB solution should certainly be possible too. – kriegaex May 13 '21 at 02:21
  • BTW, if you know how to use regex search & replace or even know how to use advanced features like IntelliJ IDEA's structural search & replace, it should be quite simple to migrate your JUnit 3 tests and suites to JUnit 4. See [this answer](https://stackoverflow.com/a/677356/1082681) if you want to get an idea about how simple it actually is. – kriegaex May 13 '21 at 02:23
  • More questions: Where does `latch` come from in `TestCallable`? Is it a static import of `TimerAdvice.latch`? Are you sure that you tests are running in a single thread in a single VM? Otherwise, you better be careful with the static field, because concurrent threads could overwrite it. – kriegaex May 13 '21 at 02:33
  • As for `System.exit()`, it does not sound credible that nothing happens when you call it, unless you have a security manager in place and its `checkExit` method does not allow to exit. In that case you should see a `SecurityException`, though. Another edge case is shutdown hooks: The thread invoking `System.exit` blocks until the JVM terminates. If a shutdown hook submits a task to this thread, it leads to a deadlock. But seeing is believing. Please provide the requested MCVE. I hate to speculate. – kriegaex May 13 '21 at 02:34
  • @kriegaex You are right! The Security Manager was suppressing this call. This security manager was added to prevent the tests from invoking `System.exit()` with non-zero status. (Check out this [issue](https://github.com/gradle/gradle/issues/11195) if you're curious why this was needed) – Abhijith Gururaj May 13 '21 at 07:33
  • I have modified the custom SecurityManager's `checkExit` method to delegate to the `super.checkExit()` method when the System.exit() call in invoked from this particular timeout thread. I had set a customized name for this timeout thread via the executor, which makes it easier for the SecurityManager to filter out threads by name and delegate the call to `super.checkExit`. – Abhijith Gururaj May 13 '21 at 07:39
  • As for the usage of `latch`, I know it is a bad idea. Yes, the agent runs in a single thread in the single vm. Wanted to reproduce this as quickly as possible. I will improve this further. – Abhijith Gururaj May 13 '21 at 07:41
  • Thank you for your valuable speculations! Please add an answer so that I could mark it as resolved. – Abhijith Gururaj May 13 '21 at 07:45

1 Answers1

1

The OP said that my comment concerning security manager helped him find the root cause, so I am converting it into an answer:

As is documented, System.exit() will not shut down the JVM if there is a security manager stopping it from doing so. In that case you should see a SecurityException, though.

A discussion in Gradle issue #11195 mentions a problem that Kafka sporadically exits unexpectedly and suggests a security manager policy for Spring-Kafka stopping it from doing so. This was committed to Spring-Kafka, but - if I understand correctly - not to Gradle.


Another edge case is shutdown hooks: The thread invoking System.exit() blocks until the JVM terminates. If a shutdown hook submits a task to this thread, it leads to a deadlock.

kriegaex
  • 63,017
  • 15
  • 111
  • 202
  • Abhijith, may I ask why you hit this problem? AFAIK, it is not a Gradle behaviour, but related to Spring-Kafka. Is the latter part of your project, or did you just refer to that issue because your situation was similar? – kriegaex May 13 '21 at 11:24
  • Okay. So gradle fails to generate a test result xml when a non-zero exit happens during test execution. (There are multiple open issues in gradle with the similar problem). I wanted to prevent such invocations of non-zero exits during the test execution. Hence, I had to add the SecurityManager to ensure a test result is generated. The security manager simply throws an exception when a non-zero System.exit() method is invoked in the jvm. – Abhijith Gururaj May 13 '21 at 12:10
  • You might ask, then why is this java agent thread invoking the System.exit() call. For this, I have added some external test listeners which specifically generate a test result in case of this timeout. I couldn't find any other way to guarantee the generation of a test result xml from a gradle test task. – Abhijith Gururaj May 13 '21 at 12:15
  • For example, You can glance over [this](https://stackoverflow.com/questions/49428792/process-gradle-test-executor-finished-with-non-zero-exit-value-29) and [this](https://github.com/gradle/gradle/issues/7802) to understand the need for a security manager and some additional test listeners for ensuring a test result is generated for a test result no matter what. When a test task fails, a result xml/html should be generated for the test, be it a bad test, timeout or some binary corruption. Ant does this, but gradle doesn't :( – Abhijith Gururaj May 13 '21 at 13:10
  • Thanks for the explanation. It will help other readers understand what you did in order to solve or mitigate the problem, and why you did it. – kriegaex May 13 '21 at 15:20