9

I ran into some problems with a multi-threaded Java-program and distilled it down to a very simple example - still my puzzlement isn't less!

The example program it shown below. So what does this do (or was intended to do)?. Well, the main() function starts off a simple thread, based on a static, inner class Runnable. This Runnable contains two nested loops, that does a simple calculation on a local variable "z" for a total of 10^12 iterations (10^6 * 10^6), after which it will print out the result and exit. After spawning this worker thread, the main thread goes into a loop of its own where it prints the string "Z" to the console after which it sleeps (using Thread.sleep()) for 1 second, then repeats this over and over.

So running this program, I would expect it to print out "Z" every 1 second while the calculating thread was doing its job.

However, in fact what happens is that the calculating thread is started, and the main thread displays the first "Z" but then nothing happens. It appears to hang in the Thread.sleep call, either infinitely or at least much, much longer than the requested 1000 ms.

Note that this is on a fast, quad-core machine with multi-threading, so there should be no trouble in running the threads simultaneously. The other cores appear as idle in Windows task manager. Also, even on a single-core system I would expect the operating system to periodically preempt the calculating thread, in order to allow the main-thread to print the strings. Further, there's no shared variables or lock between the threads, so they shouldn't be able to block each other.

Even more strangely, it appears to be critical for the behavior, that there are two nested loops in the Runnable. With just one loop, with the same number of iterations, everything works as expected.

I've tested this on Windows 10 64-bit with Java 1.8.0_73.

Can someone explain this behavior?

public class Calculate {
    static class Calc implements Runnable
    {
        @Override
        public void run() {
            int z = 1;
            for(int i = 0; i < 1000000; i++) {      
                for(int j = 0; j < 1000000; j++) {
                    z = 3*z + 1;
                }               
            }
            System.out.println("Result: " + z);
        }
    }

    public static void main(String[] args)  throws Exception
    {
        Thread t = new Thread(new Calc());
        t.start();
        while(true) {
            System.out.println("Z");
            Thread.sleep(1000);
        }
    }
}

Update 1: It has been suggested that this might be a duplicate of Busy loop in other thread delays EDT processing. Many symptoms in my case are the same, but it is hard to tell if they truly are the same cause, because the other problem doesn't appear to have been fully diagnosed. But it is a likely possibility.

Update 2: I have filed a bug report with Oracle. I will post link if it is accepted and I find it.

Update 3: Accepted as bug in Java 8 and 9: https://bugs.openjdk.java.net/browse/JDK-8152267

Community
  • 1
  • 1
Morty
  • 1,706
  • 1
  • 12
  • 25
  • 1
    It appears to work fine here, you might want to add a `Thread.yield` to the inner-most loop in your `Calc`. Also, Your `z` will eventually overflow and you're looping for a *very* large number of iterations. – Elliott Frisch Mar 20 '16 at 15:23
  • @Elliott: Interesting it works as expected for you. Why would yield() be necessary, I mean there are many programs that does "deep" calculations in a background thread and it is not normally advised they have to yield, and I've actually never seen that done in Java. It's true that "z" overflows but this calculation is just for the test example to do some dummy calculation. Overflowing is perfectly legal and well-defined for int's. And the number of iterations is intentionally high to exhibit the problem. But note it is common to have CPU-bound calculations taking a long time. – Morty Mar 20 '16 at 15:26
  • I guess what I'm looking for is some sort of guideline to "what's wrong with the way this program calculates in the background". I.e. if this is wrong, what other cases with background calculations might be wrong. What is the rule for when you have to yield, etc. – Morty Mar 20 '16 at 15:27
  • How many cores are on the machine you're testing with? And how many other processes are you running? The `Thread` works for me. – Elliott Frisch Mar 20 '16 at 15:29
  • @Elliott: It's a 6700K - so 4 true cores (at 4 GHz) and 8 logical cores due to hyper-threading. No other processes of significance running (except standard Windows and Eclipse). CPU usage is at 1% when I launch the program and no cores appear to be busy with anything. No other system problems either. Which o/s and Java version are you on? – Morty Mar 20 '16 at 15:30
  • Linux, Java 8, 4 cores. Lots of other processes. I get a `Z` about once a second. – Elliott Frisch Mar 20 '16 at 15:31
  • @Elliott: I'm starting to think this may be a bug in this Java-version, Windows or both (some interaction maybe). Either that, or a hardware error. I clearly seems like an error, that the main-thread isn't progressing. I think a lot of programs would be written by the assumption that it should be. – Morty Mar 20 '16 at 15:33
  • @EliottFrisch Windows 8 Java 8 i5 quad core. I get Z once and stops. – Yassin Hajaj Mar 20 '16 at 15:34
  • Possible duplicate of [Busy loop in other thread delays EDT processing](http://stackoverflow.com/questions/35154352/busy-loop-in-other-thread-delays-edt-processing) – davmac Mar 20 '16 at 15:52
  • See my question about a similar situation and a busy loop with a very similar structure (I've marked this question as a possible duplicate, because I really think it might be the same problem, though I can't reproduce it with your code). Short answer is, JVM JIT bug prevents appropriate insertion of safepoints in the busy thread, meaning that some things (including eg full garbage collection cycle) may block until the busy loop finishes. – davmac Mar 20 '16 at 15:55
  • @davmac: I looked at it, it could be the same but I'm not completely sure. Firstly, this case is 100% re-producible at least on my machine (although it doesn't appear to occur on all os/Java combinations as you can see here) whereas yours is not. Also, yours is somewhat more complex, involving Swing, and I can't tell if that is critical to the mechanism behind your case. I.e. if contention on shared resources plays a role in why the "safepoint" is necessary. Because in my case I don't see why a safepoint would be necessary. – Morty Mar 20 '16 at 16:00
  • @davmac: (Cont'd) The JVM can simply start a new o/s thread and that can run as long as it takes while the main thread continues. Of course if the JVM doesn't launch an o/s thread but "simulates" multi-threading by inserting safepoints in the code, then it could cause trouble, if it doesn't insert enough safepoints. But in that case it can't be inserting safepoints in either the inner or outer loop since each inner loop is processed quite quickly (as can be seen by inserting a print statement in the outer loop). If the JVM 1) simulates threads, 2) is so careless in doing so, that is... – Morty Mar 20 '16 at 16:02
  • @davmac: (Cont'd) ... to me a very serious bug (unless it is somehow specified in the language specification that you shouldn't have long-running loops and what the conditions are etc.), and it is actually surprising that anything is working at all :-) – Morty Mar 20 '16 at 16:03
  • @Morty safepoint is necessary for stop-the-world garbage collection cycle, at least, though I'm not saying that is certainly what is happening in your case. (It's not about user-space threads, or anything along those lines; the Oracle JVM maps Java threads directly to native threads). I've experimented a little with running your code however and the behavior it exhibits is very similar to the case I discovered earlier. Try converting your nested loop to a single loop with a `long` count, does the problem go away? (ok, I see you've noted in your question that it does. Sounds _very_ similar.) – davmac Mar 20 '16 at 16:05
  • @davmac: Yes with one loop with same number of total iterations, it does go away (FYI, I actually wrote that in the description). Which is interesting, considering the inner loop completes quite fast (several times a second) so it isn't because there's only a safepoint in the outer-loop or something like that. – Morty Mar 20 '16 at 16:09
  • @davmac: Yes I agree they might be similar, but why would garbage collection be necessary at this point? Also, why is it the Thread.sleep call especially that causes it to stuck (note the first "Z" is printed). But I also have the behavior that I can't cause the program in the debugger - it just causes Eclipse to act weird as well. Also when running hte program on the command line it is extremely stuck, even Ctrl-C doesn't work. – Morty Mar 20 '16 at 16:10
  • @Morty re garbage collection, as I said, I'm not saying that it is a garbage collection cycle that is the problem (but I don't know exactly why reaching a safepoint is necessary for your code; I don't know the internals of the JVM well enough). But your nested loop is almost exactly the same as mine, and as such, it almost definitely provokes the JIT bug, and the effect you see is also the same - delayed execution of another thread. – davmac Mar 20 '16 at 16:22
  • Have you tried to use a lock instead of using Thread.sleep? Also there is no way of knowing, if your main thread has thrown exception: add try with finally block just to make sure you get notification. Also, you are sharing System.out between threads without using synchronization. Try adding static synchronized method for println and add flush to it aswell (maybe os is buffering your output) – Sami Korhonen Mar 20 '16 at 17:13
  • @Sami: Sorry to say, this seems like just random suggestions and bad at that. Lets take them in order: 1) Locks - how would a lock be useful for sleeping for a specific time period? 2) No it hasnt thrown an exception, if it did this on the main thread it would have been printed out and besides finally would have no role in showing an exception even if you did add a try/finally block. 3) It is permitted to use System.out from multiple threads. 4) No the o/s is not buffering (but this was the best of the suggestions). – Morty Mar 20 '16 at 18:29
  • I suppose you can try the same JVM-based workaround as for my problem: `-XX:+UseCountedLoopSafepoints` on the Java command line. Does that make your problem go away? If so, I'm almost sure that you're seeing the same problem as me. – davmac Mar 31 '16 at 15:33
  • davmac: My JVM doesn't support UseCountedLoopSafepoints, it prints: "Unrecognized VM option 'UseCountedLoopSafepoints' Did you mean '(+/-)UseLoopSafepoints'?". I've then tried the suggested "UseLoopSafepoints" but here I get a message saying it is only supported in a debug version of the JVM. But I agree it seems very much like a new incarnation of the problem you had/have. – Morty Apr 09 '16 at 08:29

4 Answers4

0

Seems like your program overflow my machine I also only get Z once (Os x, i7, 16G).

Edit : What you could do to be sure to give a chance to the main Thread to print "z" is yield every time you loop over i :

@Override
public void run() {
    int z = 1;
    for(int i = 0; i < 1000000; i++) {    
        Thread.yield();
        for(int j = 0; j < 1000000; j++) {
            z = 3*z + 1;
        }               
    }
    System.out.println("Result: " + z);
}
Benoit Vanalderweireldt
  • 2,925
  • 2
  • 21
  • 31
  • Benoit: Not sure what I mean by "program overflow the normal and execution order of events". What "contract" is this program violating? Wouldn't it be a reasonable expectation that the main thread should progress? There's no resource contention either Java-resources or compute resources. It is running on a system that is by no means constrained. Windows task manager shows the system as 15% busy with only one core occupied. – Morty Mar 20 '16 at 15:36
  • I guess it's linked to how Java implement loops, when I reduce the time between 2 print of "z" it works fine.A theory would be that Java wouldn't woke up other threads if he is doing a too intensive operation ? – Benoit Vanalderweireldt Mar 20 '16 at 15:39
0

As I have put in in "update 3" in the description, this turned out to be a bug in Java 8 and 9. So this is the correct answer to this question: There is nothing wrong with the program, the behavior is caused by bugs in certain versions of Java, causing it to not behave as it should. While suggestions of inserting pauses, yields etc. in the program might be relevant for someone looking for a practical workaround, the "lack of these workarounds" is not the root cause of the problem and do not represent an error in the original program.

Morty
  • 1,706
  • 1
  • 12
  • 25
-1

Short answer

It is a problem of CPU allocation. To solve it, add the line Thread.currentThread().yield(); after the for (int j=0 ...) loop.

Long answer

I used the following modified program to find the problem

public class Calculate {

    public static final int[] LOOP_SIZES = {
        1000, 5000, 10000, 50000, 65000, 66000, 100000, 500000, 1000000 };

    static class Calc implements Runnable
    {
        private int loopSize;
        public Calc(int loopsize) {
            loopSize = loopsize;
        }
        @Override
        public void run() {
            int z = 1;
            final long startMillis = System.currentTimeMillis();
            for(int i = 0; i < loopSize; i++) {
                for(int j = 0; j < loopSize; j++) {
                    z = 3*z + 1;
                }
                Thread.currentThread().yield();  // comment this line to reproduce problem                                         
            }
            final long endMillis = System.currentTimeMillis();
            final double duration = ((double)( endMillis - startMillis )) / 1000.0;
            System.out.println("Result for " + loopSize +
                               ": " + z +
                               " @ " + duration + " sec");
        }
    }

    public static void main(String[] args)  throws Exception
    {
        Thread tt[] = new Thread[LOOP_SIZES.length];
        for (int i = 0; i < LOOP_SIZES.length; i++) {
            final int ls = LOOP_SIZES[i];
            tt[i] = new Thread(new Calc(ls));
            tt[i].start();
        }
        int nbNonTerminated = 1;
        while(nbNonTerminated > 0) {
            nbNonTerminated = 0;
            for (int j=0; j < tt.length; j++)  {
                printThreadState( tt[j], "Thread " + j );
                if (!tt[j].getState().equals(java.lang.Thread.State.TERMINATED))
                    nbNonTerminated ++;
            }
            printThreadState( Thread.currentThread(), "Main thread");
            System.out.println("Z @ " + System.currentTimeMillis());
            Thread.sleep(1000);
        }
    }

    public static void printThreadState( Thread t, String title ){
        System.out.println( title + ": " + t.getState() );
    }
}

If you run this program, the main thread works as intended (it prints a message every second). If you comment the line with the yield, recompile and run it, the main loop prints a few messages, then blocks. It seems like the JVM allocates almost all CPU to the other threads, rendering the main thread unresponsive.

The number of threads did not seem to change the problem. The problem started to appear when the "loop size" became larger than about 65000. Also, note I did not join threads when they finish.

note: tested on Mac OSX, 2.7.5, 2.5 GHz Core i5.
I also tested it on a Raspberry Pi 2, Raspbian linux 4.1.3 (a much less powerful machine): the problem does not show.

Sci Prog
  • 2,651
  • 1
  • 10
  • 18
  • Isn't Java using native os threads now days? If so, scheduling should be done by os. In windows main thread used to be special (handled window events by default), if this has not changed, it could be part of the problem: e.g window rendering/refreshing events are getting queued and thread never exits the sleep – Sami Korhonen Mar 20 '16 at 17:51
  • The OP stated he had the problem in Windows 10, and I was able to reproduce it in OSX. I don't know a lot about JVM internals. However, it seems strange that the problem appears when loop_size is about 2^16 (so both loops about 2^32--coincidence?). Seems like when a thread succeeds in grabbing a lot of CPU cycles, at some point the JVM allocates almost all resources to it. "The more you have, the more you want" or "The more you spend, the more they give to you". That would explain why the problem does not show on a less powerful machine. – Sci Prog Mar 20 '16 at 17:59
  • I would try moving the wait loop outside the main thread to make sure that main thread is not doing something unexpected (osx is windowed system and something needs to handle events afaik) – Sami Korhonen Mar 20 '16 at 18:11
  • 1
    Hi thanks, I agree that this solves the problem (in fact just one yield call in the outer loop is enough). But I dont think this explains the behavior, this isnt how threading is supposed to work. How many programs that launch a background thread that does calculations (which may take minutes) makes sure to call yield periodically? This is clearly a JVM bug if this is so, since there is not a specification of when this would be necessary, essentially leaving things to chance. @Sami: It isnt Windows related, the Java main thread != Windows program main thread. – Morty Mar 20 '16 at 18:35
-5

Inside the run method you have no loop, just a simple calculation of z. Thus when you call t.start(); the run method is called once, calculates the z and then you just put the thread to sleep forever.

apboutos
  • 109
  • 2
  • 16
  • Not sure I follow. There's a loop in run(), are you implying it is somehow optimized away? That doesn't appear to be the case because it shows one core as fully busy when running. Also, even if that was the case, the program should still continue displaying "Z"'s forever. – Morty Mar 20 '16 at 15:29