3

I continue to explore a JIT assemble output and I found a pair of strange load/store instructions:

mov    0x30(%rsp),%rdx ; <---- this load 

test   %edi,%edi
jne    0x00007fd3d27c5032

cmp    %r11d,%r10d
jae    0x00007fd3d27c4fbc
mov    0x10(%rbx,%r10,4),%edi
test   %edi,%edi
je     0x00007fd3d27c5062

mov    0xc(%rbp),%esi
test   %esi,%esi
je     0x00007fd3d27c4fea
mov    %r8d,0x1c(%rsp)
mov    %rdx,0x30(%rsp) ; <---- this store 

mov    %rax,0x28(%rsp)
mov    %ecx,0x10(%rsp)
mov    %rbp,0x20(%rsp)
mov    %rbx,0x8(%rsp)
mov    %r13d,%ebp
mov    %r10d,0x14(%rsp)
mov    %r11d,0x18(%rsp)
mov    %r14d,0x40(%rsp)
mov    %r9,(%rsp)
lea    (%r12,%rdi,8),%rdx

shl    $0x3,%rsi
callq  0x00007fd3caceaf00



mov    0x20(%rsp),%r11
mov    0x10(%r11),%r10d


mov    0x8(%r12,%r10,8),%r8d
cmp    $0xf2c10,%r8d
jne    0x00007fd3d27c4ffa

lea    (%r12,%r10,8),%r8

mov    0x10(%r8),%r10
movabs $0x7fffffffffffffff,%r9
cmp    %r9,%r10
je     0x00007fd3d27c5092

mov    %r10,%rdx
add    $0x1,%rdx

test   %rdx,%rdx
jle    0x00007fd3d27c50ce
mov    %r10,%rax
lock cmpxchg %rdx,0x10(%r8)
sete   %r11b
movzbl %r11b,%r11d

test   %r11d,%r11d
je     0x00007fd3d27c5116
test   %r10,%r10
jle    0x00007fd3d27c4f48



mov    0x108(%r15),%r11
mov    0x14(%rsp),%r10d
inc    %r10d

mov    0x1c(%rsp),%r8d
inc    %r8d
test   %eax,(%r11)


mov    (%rsp),%r9
mov    0x40(%rsp),%r14d
mov    0x18(%rsp),%r11d
mov    %ebp,%r13d
mov    0x8(%rsp),%rbx
mov    0x20(%rsp),%rbp
mov    0x10(%rsp),%ecx
mov    0x28(%rsp),%rax

movzbl 0x18(%r9),%edi
movslq %r8d,%rsi

cmp    0x30(%rsp),%rsi
jge    0x00007fd3d27c4f17

cmp    %r11d,%r10d
jl     0x00007fd3d27c4dea    ; this is the end of the loop
                             ; jump to the first instruction in this listing 

Why are these instructions needed? There is no work with %rdx between the load/store. Yes, this is a loop, but I don't see why it might be useful on the next iterations neither.

Is it a bug or is it the same sort of JVM tricks as in my previous question?

I've found the same problem in this article but there is no explanation there.

The full PrintAssemble you might see here and the original code is here

Thanks!

Peter Cordes
  • 328,167
  • 45
  • 605
  • 847
QIvan
  • 652
  • 4
  • 13
  • 3
    Are you sure there are no branch targets anywhere in those blocks? If not, then yeah it looks like a missed optimization, but those are not rare in JITed code. This one looks easy for a compiler to spot, though, again *if* there's no way anything else can branch into any of the basic blocks after the load, and *if* the compiler can easily prove that. – Peter Cordes Jan 22 '19 at 21:27
  • 1
    It's not the full body of nmethod. Can you post the complete PrintAssembly output, including stubs? – apangin Jan 22 '19 at 21:39
  • i think no, at least perfasm didn't show me them... Here it draws arrows for jums https://github.com/QIvan/reactive-hardcore/blob/master/result.txt#L83 – QIvan Jan 22 '19 at 21:40
  • 2
    perfasm shows only *hottest* regions. This does not yet mean there is no inbound branch from somewhere else (e.g. from a stub). – apangin Jan 22 '19 at 21:44
  • sorry that stubs do you mean? This is full output of run jmh with "perfasm" key https://github.com/QIvan/reactive-hardcore/blob/master/result.txt – QIvan Jan 22 '19 at 21:44
  • ok i'll try to print it out and add to the repo – QIvan Jan 22 '19 at 21:45

1 Answers1

4

I've reproduced the full assembly code for ArraySubscription.slowPath. Though the register mapping is slightly different comparing to your snippet, the code structure is exactly the same.

The incomplete fragment led you to a wrong conclusion. Actually %rdx can change between load and store, because there is a branch target in the middle: L219 -> L55

This becomes quite understandable when looking at the corresponding Java source code:

        while (true) {
            for (; sent < n && idx < length; sent++, idx++) {
                if (canceled) {
                    return;
                }

                T element = array[idx];

                if (element == null) {
                    subscriber.onError(new NullPointerException());
                    return;
                }

                subscriber.onNext(element);
            }

Perfasm showed you the compiled code for the hot inner for loop. The value at 0x30(%rsp), which is also cached in %rdx, holds the local variable n. But then, after the loop, the value of n changes:

            n = requested;

and the outer while continues. The corresponding compiled code updates n only in a register, not in 0x30(%rsp).

apangin
  • 92,924
  • 10
  • 193
  • 247
  • Wow! That was really awesome! I wish in some day to be able to understand an assemble code as you do ) Sorry, yesterday I spent some time on fighting with PrintAssemble output in jmh (additional thank you for the answer https://stackoverflow.com/a/36293812/2406992) and when i've got the full assembly code https://github.com/QIvan/reactive-hardcore/blob/master/result_printAssembly.txt was too late and I wanted to explore it myself first. Aaand... today I got the same results as you did and get here to write that I was idiot... Anyway many thanks for your answer! – QIvan Jan 23 '19 at 21:57
  • btw how did you print assebly code in INTEL syntax instead of AT&T? – QIvan Jan 23 '19 at 21:58
  • 1
    @QIvan [Here it is](https://stackoverflow.com/questions/9337670/hotspot7-hsdis-printassembly-intel-syntax) – apangin Jan 23 '19 at 23:22