10

Consider the following Java code fragment:

String buffer = "...";
for (int i = 0; i < buffer.length(); i++)
{
    System.out.println(buffer.charAt(i));
}

Since String is immutable and buffer is not reassigned within the loop, will the Java compiler be smart enough to optimize away the buffer.length() call in the for loop's condition? For example, would it emit byte code equivalent to the following, where buffer.length() is assigned to a variable, and that variable is used in the loop condition? I have read that some languages like C# do this type of optimization.

String buffer = "...";
int length = buffer.length();
for (int i = 0; i < length; i++)
{
    System.out.println(buffer.charAt(i));
}
stackoverflowuser2010
  • 38,621
  • 48
  • 169
  • 217

3 Answers3

7

In Java (and in .Net), strings are length counted (number of UTF-16 code points), so finding the length is a simple operation.

The compiler (javac) may or may not perform hoisting, but the JVM JIT Compiler will almost certainly inline the call to .length(), making buffer.length() nothing more than a memory access.

Community
  • 1
  • 1
Mitch
  • 21,223
  • 6
  • 63
  • 86
  • What about really long Strings, let's say a few 1000K? – Drejc Oct 17 '14 at 17:52
  • It's `O(1)` cost, so it doesn't matter. The string is stored as `{ length = 1000, character data = { 0x65, ... 0x65 } }`. – Mitch Oct 17 '14 at 17:53
  • I think he is talking about the JIT (Just in time compiler) with jitter – Marco Acierno Oct 17 '14 at 17:53
  • @Drejc: No matter how long it is, the `String` knows its own length without counting. (It has to, because there's no way to count. Unlike in (say) C, where strings are terminated by `\0`, in Java there is no way to determine the length of a string by examining the character contents.) – ruakh Oct 17 '14 at 17:53
  • Incidentally, Mitch, `length` is the number of UTF-16 code units (`char`s), so, not exactly "byte counted". (A string of length `3` has three `char`s, which is six bytes.) But, +1 anyway, since this is a minor point. – ruakh Oct 17 '14 at 17:55
  • Java is a platform which specifies a virtual machine for its execution. The compiler takes the source code and generates java bytecode, which is then translated to machine code (x86, arm thumb, etc...) by the Just In Time compiler (aka JIT). – Mitch Oct 17 '14 at 17:57
  • @Mitch: so the bottom line is that it's ok to leave in `buffer.length()` in the loop condition because it's a memory read (due to the function being inlined)? – stackoverflowuser2010 Oct 17 '14 at 18:03
  • 3
    The bottom line is that unless profiling has identified something as being a problem, I wouldn't sacrifice readability for potential speed. "programs must be written for people to read, and only incidentally for machines to execute" ~H. Abelson – Mitch Oct 17 '14 at 18:05
2

The Java compiler (javac) performs no such optimization. The JIT compiler will likely inline the length() method, which at the very least would avoid the overhead of a method call.

Depending on which JDK you're running, the length() method itself likely returns a final length field, which is a cheap memory access, or the length of the string's internal char[] array. In the latter case, the array's length is constant, and the array reference is presumably final, so the JIT may be sophisticated enough to record the length once in a temporary as you suggest. However, that sort of thing is an implementation detail. Unless you control every machine that your code will run on, you shouldn't make too many any assumptions about which JVM it will run on, or which optimizations it will perform.

As to how you should write your code, calling length() directly in the loop condition is a common code pattern, and benefits from readability. I'd keep things simple and let the JIT optimizer do its job, unless you're in a critical code path that has demonstrated performance issues, and you have likewise demonstrated that such a micro-optimization is worthwhile.

Mike Strobel
  • 25,075
  • 57
  • 69
  • "JIT compiler will likely inline the length() method". Any documentation on this? I love to read about this sort of stuff. – stackoverflowuser2010 Oct 17 '14 at 17:56
  • 1
    It actually doesn't return the length of the `char[]`, since the `String` might only use part of it. `String`s have their own `final int` field to remember the length. – resueman Oct 17 '14 at 18:00
  • By default, I believe the Oracle JVM will inline up to 35 **bytes** of bytecode for a method that has been called at least once. I believe there is a larger threshold for frequently-called methods. You might check [this StackOverflow question](http://stackoverflow.com/questions/18737774/hotspot-jit-inlining-strategy-top-down-or-down-top). @resueman, the version of the source I'm looking at doesn't have a separate field. Yet another example of assumptions we should not make about the client machine's JVM/JDK :). – Mike Strobel Oct 17 '14 at 18:02
  • @resueman String array sharing has been gone at least since Java 7. It was indeed implemented this way, but thats a thing of the ancient past. Java7/8 use a private array, with exactly the length in chars. – Durandal Oct 17 '14 at 18:07
  • @MikeStrobel Interesting. I'd thought every implementation was handling it that way. The openjdk-7 docs do. Definitely goes to show that you shouldn't trust in details like that unless you have complete control over the environment :) – resueman Oct 17 '14 at 18:07
  • @MikeStrobel, you seem to have a good means of accessing source. Is there a website you use or do you have it checked out locally? – Mitch Oct 17 '14 at 18:08
  • @Mitch The sources for the public/portable JDK classes ship with the JDK. I imagine most IDEs are capable of navigating to those sources just as you would jump to a type declared within your project. IntelliJ handles it quite seamlessly: just Ctrl+Click on a `String` type identifier, or hit Ctrl+N and type `String`. – Mike Strobel Oct 17 '14 at 18:10
  • @MikeStrobel, hmm. Learn something new about my IDE every day..., thanks. – Mitch Oct 17 '14 at 18:13
1

You can do several things to examine the two variations of your implementation.

  1. (difficulty: easy) Make a test and measure the speed under similar conditions for each version of the code. Make sure you loop is significant enough to notice a difference, it is possible that there is none.

  2. (difficulty: medium) Examine the bytecode with javap and see how the compiler has interpreted both versions (this might differ depending on javac implementation) or it might not (when the behavior was specified in the spec and left no room for interpretation by the implementor).

  3. (difficulty: hard) Examine the JIT output of both versions with JITWatch, you will need to have a very good understanding of bytecode and assembler.

Juru
  • 1,623
  • 17
  • 43