2

I have a main method that uses a ScheduledExecutorService with a core pool size of 75 to schedule running of 15 actions each for a total of 50,000 users:

Lists.partition(users, 100).stream()
    .forEach(batchOfUsers -> batchOfUsers
    .parallelStream().forEach(scheduleKickingOffActions));

scheduleKickingOffActions has the user call scheduledExecutor.schedule(Runnable, delay) 15 times with different runnable actions and arbitrary delays.

This is a fairly heavy program, with a lot going on. As the main thread progresses with the batches, the schedules start kicking in and performing the actions. When each scheduled action kicks off, it reschedules itself with another arbitrary delay. So, each user runs each of his actions repeatedly.

Looking at the logging statements inside of scheduleKickingOffActions, I notice that the main thread responsible for running the above code snippet stops at some point while scheduling these users' activities. The point ranges from the 4000th user to the 40000th. It seemingly 'forgets' to finish its task.

I don't have a good explanation for why this would happen, I can only imagine that as the scheduled tasks in the ScheduledExecutorService kick off, they manage to preempt the main thread..

I have further evidence for my main thread completely crapping out. After the batching code, I have a sleep statement:

Thread.sleep(360);

following which I do a:

scheduledExecutor.shutdownNow();
scheduledExecutor.awaitTermination(1, TimeUnit.HOURS);
System.out.println("\nFinished.\n");
stopwatch.stop();
System.out.println("Total time elapsed - " + stopwatch.elapsed(TimeUnit.SECONDS) + " seconds");
System.exit(0);

to enforce the program to run for 6 hours. None of these statements print, and my program keeps running beyond the 6-hour mark until I Ctrl+C out of it. Apparently, the main thread gets thrown off at some point during the batching job and never progresses further.

I don't see any exceptions in the output nor the logs, and the scheduled tasks all work as expected, and all get rescheduled as expected.

Would someone with more Java expertise care to share their thoughts on this? Is it a case of just too many threads working at the same time? I would expect the main thread to return to its task at some point.

Siddhartha
  • 4,296
  • 6
  • 44
  • 65
  • 1
    If you reduce the number of threads or the number of users, does the issue still occur? Try also sending a SIGQUIT to the process after the main method stops making progress; this will output stack traces for all running threads and possibly help you identify a deadlock. If neither of those help try inspecting the process with a debugger. – dimo414 Feb 24 '16 at 05:52
  • I have no clue what the Thread.sleep is supposed to do. Are you sure the main thread is dying, and not blocking on something? Maybe you should check with a debugger and see if main has actually stopped. Also, the scheduled executor could throw an exception. Are you doing something to catch the exception. You will probably need to include more code. – matt Feb 24 '16 at 06:14
  • @dimo414 With lesser users it does not occur. I will try lessening the # threads. Will try the SIGQUIT trick as well. Thanks :) – Siddhartha Feb 24 '16 at 06:47
  • 2
    A common mistake is to submit tasks to an ExecutorService but not catch `Throwable`. If you don't do this it will be placed in the `Future` which is returned and unless you are monitoring this it is silently discarded. The result of this that any exception or error you don't catch and log yourself will silent cause the task to stop running. – Peter Lawrey Feb 24 '16 at 09:05
  • @matt No I'm not sure if it's just dying or blocking. About the sleep, I want the main thread to sleep for a certain time in which the the scheduled executor carries out all its tasks. I'm not catching any throwables no, my actions can fail and I didn't expect those failed scheduled actions to affect my main thread. – Siddhartha Feb 24 '16 at 16:51
  • @PeterLawrey Thanks, I will catch Throwable and see what happens. I found a useful [reference for it.](http://stackoverflow.com/q/2248131/792238) The actions I'm scheduling can fail and I don't wish that to affect my main thread. – Siddhartha Feb 24 '16 at 17:02
  • @Siddhartha if your task gets an uncaught exception or error, it is stored in the `Future` returned and the task is not run again, even if it is a scheduled or repeating task. The problem is that if you don't choose to examine `Future` object, you will have no idea this has happened. – Peter Lawrey Feb 25 '16 at 08:13

0 Answers0