5

I have been reading Effective Java, 3/E.

While reading the part regarding hashcode, (page 51) I noticed the book saying

A nice property of 31 is that the multiplication can be replaced by a shift and a subtraction for better performance on some architectures: 31 * i == (i << 5) - i. Modern VMs do this sort of optimization automatically.

I thought this makes sense. And I wondered how much faster would the code become when this kind of optimization took place. So I wrote a short code to see the impact of such optimizations.

But, it seemed like there were no noticeable differences. So I wrote much simpler code, to check if such optimization even took place.

Below is my sample code.

fun main() {
    val num = Random.nextInt()
    val a = num * 30
    val b = num * 31
    val c = num * 32

    println("$a, $b, $c")
}

And this is the compiled machine code, got from IntelliJ's Kotlin bytecode feature.

   L1
    LINENUMBER 5 L1
    ILOAD 0
    BIPUSH 30
    IMUL
    ISTORE 1
   L2
    LINENUMBER 6 L2
    ILOAD 0
    BIPUSH 31
    IMUL
    ISTORE 2
   L3
    LINENUMBER 7 L3
    ILOAD 0
    BIPUSH 32
    IMUL
    ISTORE 3

Apparently, there are no differences. We push each number and just call IMUL. I thought maybe the optimization takes place when the Java bytecode is compiled into actual machine code, but I have never checked that side, so I do not know how to confirm my theory. I searched, and it seems like the keyword I am looking for is the JIT compiler, which seemingly converts the .class into cpu-specific machine code.

I thought, maybe I can try to convert this code to cpu-specific machine code through JIT compiler, but then that means I checked this theory on one specific CPU, not all CPUs. I want to see if it is "generally true", but that will take too much time.

So, is there any way to confirm that above code is actually (generally) being optimized by the compiler? If I have similar questions in the future, where should I look for? I mean, when I'm curious about java behavior, I go the oracle and check JVM reference or java se reference. But what about the compiler behavior? Where should I start?

That was a long question. Thank you for spending your precious time on reading this question.

(just an additional note)

I checked C and python on https://godbolt.org/, and confirmed that for C, it is actually optimized.

int test(int num) {
    int n = rand();
    int a= n*30;
    int b= n*31;
    int c= n*32;
    return a * b * c;
}
test:
        push    rbp
        mov     rbp, rsp
        sub     rsp, 32
        mov     DWORD PTR [rbp-20], edi
        call    rand
        mov     DWORD PTR [rbp-4], eax
        mov     eax, DWORD PTR [rbp-4]
        imul    eax, eax, 30
        mov     DWORD PTR [rbp-8], eax
        mov     edx, DWORD PTR [rbp-4]
        mov     eax, edx
        sal     eax, 5
        sub     eax, edx
        mov     DWORD PTR [rbp-12], eax
        mov     eax, DWORD PTR [rbp-4]
        sal     eax, 5
        mov     DWORD PTR [rbp-16], eax
        mov     eax, DWORD PTR [rbp-8]
        imul    eax, DWORD PTR [rbp-12]
        imul    eax, DWORD PTR [rbp-16]
        leave
        ret

However, python, was not.

num = randint()

a = num * 30
b = num * 31
c = num * 32
  5          18 LOAD_NAME                2 (num)
             20 LOAD_CONST               2 (30)
             22 BINARY_MULTIPLY
             24 STORE_NAME               3 (a)

  6          26 LOAD_NAME                2 (num)
             28 LOAD_CONST               3 (31)
             30 BINARY_MULTIPLY
             32 STORE_NAME               4 (b)

  7          34 LOAD_NAME                2 (num)
             36 LOAD_CONST               4 (32)
             38 BINARY_MULTIPLY
             40 STORE_NAME               5 (c)
             42 LOAD_CONST               5 (None)
             44 RETURN_VALUE
Cheolho Jeon
  • 359
  • 1
  • 12
  • 12
    bytecode-oriented languages like Java, Kotlin and Python usually do the optimization at the VM level. That means the source → bytecode compiler is usually dumb in that regard. The VM can optimize much more heavily and also only when it really matters for performance. When people say "the compiler optimizes this and that" they usually mean the bytecode → machine code (aka JIT) compiler, e.g. Hotspot in the stock JVM. – Clashsoft Jul 05 '20 at 12:08
  • 5
    Inspecting the bytecode for performance is wasted time. Make a JMH benchmark and inspect the generated assembly. – maaartinus Jul 05 '20 at 12:12
  • thank both of you for your comments. If any of you can post it as an answer instead of a comment, I will pick that as an answer. Thank you so much for helping a starter. – Cheolho Jeon Jul 05 '20 at 12:23
  • 3
    See [java - How to see JIT-compiled code in JVM? - Stack Overflow](https://stackoverflow.com/questions/1503479/how-to-see-jit-compiled-code-in-jvm) –  Jul 05 '20 at 12:43

1 Answers1

7

Languages like C are compiled Ahead-Of-Time, which means all optimizations are done in compile time since they are compiled to assembly code and interpreted by the local machine.

Kotlin, Scala, Java, etc. JVM languages runs on the Java Virtual Machine. Implementations of the JVM does runtime optimizations. This is called Just-In-Time Compilation. An example of a JIT compiler is HotSpot, like its name it finds “hot spots” of JVM code and compile and optimize it to assembly. An alternative JIT to HotSpot is OpenJ9.

Languages like Python, I believe, are interpreted at runtime which means there are no optimizations involved at all. But AOT compilers for python might actually do some optimizations, but I don't really know the details of the implementations of these compilers.

Basil Bourque
  • 303,325
  • 100
  • 852
  • 1,154
Deadbeef
  • 1,499
  • 8
  • 20