5

I am developing a game on android, 'Space RPG' - currently only seeing this error pop up on most Galaxy S4s, and HTC Ones. This is all Java.

The game will stall, when I try to debug the process and suspend the thread in question, it won't suspend, and a spin-on-suspend error happens. The thread dump lets me see that it was inside a certain while loop that is taking a desired 'end position' and iterating backwards at an ever increasing distance step to find a 'start position'.

This is where things get annoying. I can verify that the loop can not run indefinitely, even though the condition is while(true), it is not possible for it to run more than maybe 200 iterations before my break gets called (this assertion being backed up by the code working on every other phone I have tried).

To help ease my mind on this matter, I added a simple incremented variable inside the loop, and if it ever goes above 1000 it will log something out so I can see that it DID run too many times, just in case some variable was set badly or something. When this counter code is present, NO crash/hang occurs. Nor do I see any logs indicating it ran over 1000 times.

If I remove this counter, the hang occurs every time after 5-10 seconds of playing [in which the while loop will have run maybe 10 times, though that varies].

My question is therefore, what the hell is going on? :P Why would these newer phones (but seemingly none of the older phones) have a problem with a loop that is doing valid work and doesn't last long, when there is no incremented variable in there. How could the thread possibly stall in that loop, and how does having an extra counter variable fix the issue? This work is being done on the opengl render thread, in case that is important.

I have reports of this happening on most S4s, but there is at least one S4 out there where it didn't happen. The one I am using today it IS happening. This makes me wonder if it could possibly be to do with the specific android, java, dalvik or something else on the phone but I unfortunately dont have any details from the S4 where it worked.

Any help, guidance, thoughts or further reading on stuff like this would be appreciated, thanks a lot.

float vel = 1.0f; // final velocity is 1. We are working backwards to find out what the starting velocity will need to be.
int i = 0;
double xmath = Math.sin(rot* (Math.PI/180.0f)); // component of velocity for x,y direction based on rotation
double ymath =  Math.cos(rot* (Math.PI/180.0f));
while (true) {
        /* with this section uncommented, the stall never happens...
         ++i;
        if (i>1000) {
            // Something went rather wrong
            vel = 91.0f; // LOG WAS HERE, now has a fallback value justincase
            break;
        }
        */
        vel *= 1.2f;            
        dx -= vel* xmath;
        dy += vel* ymath;           
        if (distance < (float)Math.sqrt(dx*dx+dy*dy)) {
            break;
        }
}
// Give the ship a velocity that is appropriate for the distance remaining
_AI.Ship.Velocity = vel;
fadden
  • 51,356
  • 5
  • 116
  • 166
Esaptonor
  • 143
  • 8
  • Can you show some code? – Fildor Sep 19 '13 at 09:51
  • And you can definitly exclude a threading issue, like deadlock or something like that, I guess. But then again, if it were a deadlock, it should happen on more than the brand new phones ... hard one. – Fildor Sep 19 '13 at 10:48
  • 1
    Dalvik threads suspend themselves when they reach a safe point. Spin-on-suspend means the thread is not reaching a safe point, so (a) your loop isn't exiting, and (b) something in the VM is screwed up. Do you see something like "JIT unchain all" in the logcat? My initial guess is that the JIT stripped suspend checks from the tight loop, which is allowed -- the other thread can "unchain" segments to reintroduce the checks -- but something isn't working right. – fadden Sep 21 '13 at 05:55
  • Do you see the problem on GS4/HTC1 running stock Android, rather than the retail Samsung/HTC edition? – fadden Sep 21 '13 at 14:37

1 Answers1

3

This is probably http://b.android.com/58726.

The bug has full details; in short: some vendors appear to use a modified version of the Dalvik VM. Changes made to the JIT compiler prevent thread suspension from occurring in certain situations.

The litmus test for this issue is to compare the standard retail device against the "pure Android" Google Play Edition of the GS4 and HTC1. If the former shows the broken behavior, but the latter works correctly, you are likely seeing a vendor-specific problem.

The workaround is to do what you've done: make the code less efficient so it doesn't fall into the "optimized" case. Ideally the app would runtime-select a different code path for devices without the issue, but I don't know of a good way to detect the problem at run time.

fadden
  • 51,356
  • 5
  • 116
  • 166
  • Is it possible for this to be happening also to android native codes, abstracting which piece of software would be causing it? It's the only reason I know `ffmpeg` (static binary custom compilation, n2.0.1 tag mint as it clones) is failing to me on some devices and not in others (but fails in no device when launched over a `valgrind`!! that, am now supposing, traps frequently the inner program) – 1737973 Oct 15 '13 at 07:00
  • Native code is compiled with gcc, which is entirely separate from the JIT compiler inside Dalvik. – fadden Oct 15 '13 at 16:27