1

In Java, String class has a method: public String[] split(String regex)

This method accepts a regular expression and splits the string it's called on by this regular expression, returning an array of Strings, the result of the split.

Basically, it has some optimizations for regexes with length of 1, and for the longer regular expressions, it calls just this code:

Pattern.compile(regex).split(this, 0)

So basically, it compiles the regular expression every time the split(String regex) method is called. But actually, pattern compilation is a relatively costly operation, but a pattern can be used to process (split in this case) multiple strings at different times (also it's immutable and therefore thread safe).

My question is: Why doesn't Java compiler or JVM optimize this somehow, either on compilation or runtime, by (maybe lazy) precompiling the regular expressions of split("regex") calls? I cannot think of a benefit of compiling a string literal to a pattern over and over. I'm thinking about something similar to String interning, or just keeping an array somewhere for precompiled Patterns?

I wrote the following code to observe the difference in performance (Regular Expression and String taken from this answer) (the sample string emails is the list of emails with SOMETHING between each of them):

public static final String REGEX = "(?:(?:\\r\\n)?[ \\t])*(?:(?:(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\"(?:[^\\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*\"(?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\"(?:[^\\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*\"(?:(?:\\r\\n)?[ \\t])*))*@(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*))*|(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\"(?:[^\\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*\"(?:(?:\\r\\n)?[ \\t])*)*\\<(?:(?:\\r\\n)?[ \\t])*(?:@(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*))*(?:,@(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*))*)*:(?:(?:\\r\\n)?[ \\t])*)?(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\"(?:[^\\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*\"(?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\"(?:[^\\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*\"(?:(?:\\r\\n)?[ \\t])*))*@(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*))*\\>(?:(?:\\r\\n)?[ \\t])*)|(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\"(?:[^\\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*\"(?:(?:\\r\\n)?[ \\t])*)*:(?:(?:\\r\\n)?[ \\t])*(?:(?:(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\"(?:[^\\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*\"(?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\"(?:[^\\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*\"(?:(?:\\r\\n)?[ \\t])*))*@(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*))*|(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\"(?:[^\\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*\"(?:(?:\\r\\n)?[ \\t])*)*\\<(?:(?:\\r\\n)?[ \\t])*(?:@(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*))*(?:,@(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*))*)*:(?:(?:\\r\\n)?[ \\t])*)?(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\"(?:[^\\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*\"(?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\"(?:[^\\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*\"(?:(?:\\r\\n)?[ \\t])*))*@(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*))*\\>(?:(?:\\r\\n)?[ \\t])*)(?:,\\s*(?:(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\"(?:[^\\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*\"(?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\"(?:[^\\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*\"(?:(?:\\r\\n)?[ \\t])*))*@(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*))*|(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\"(?:[^\\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*\"(?:(?:\\r\\n)?[ \\t])*)*\\<(?:(?:\\r\\n)?[ \\t])*(?:@(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*))*(?:,@(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*))*)*:(?:(?:\\r\\n)?[ \\t])*)?(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\"(?:[^\\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*\"(?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\"(?:[^\\\"\\r\\\\]|\\\\.|(?:(?:\\r\\n)?[ \\t]))*\"(?:(?:\\r\\n)?[ \\t])*))*@(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*)(?:\\.(?:(?:\\r\\n)?[ \\t])*(?:[^()<>@,;:\\\\\".\\[\\] \\000-\\031]+(?:(?:(?:\\r\\n)?[ \\t])+|\\Z|(?=[\\[\"()<>@,;:\\\\\".\\[\\]]))|\\[([^\\[\\]\\r\\\\]|\\\\.)*\\](?:(?:\\r\\n)?[ \\t])*))*\\>(?:(?:\\r\\n)?[ \\t])*))*)?;\\s*)";

public static void main(String[] args) {
    String emails = "\"Fred Bloggs\"@example.comSOMETHINGuser@.invalid.comSOMETHINGChuck Norris <gmail@chucknorris.com>SOMETHINGwebmaster@müller.deSOMETHINGmatteo@78.47.122.114";

    long t1 = System.currentTimeMillis();
    test1(emails);
    System.out.println("Test 1 took " + (System.currentTimeMillis() - t1) + " milliseconds...");

    long t2 = System.currentTimeMillis();
    test2(emails);
    System.out.println("Test 2 took " + (System.currentTimeMillis() - t2) + " milliseconds...");
}

public static void test1(String input) {
    for (int i = 0; i < 100000; i++) {
        input.split(REGEX);
    }
}

public static void test2(String input) {
    Pattern pattern = Pattern.compile(REGEX);
    for (int i = 0; i < 100000; i++) {
        pattern.split(input);
    }
}

And the output is:

Test 1 took 34831 milliseconds...
Test 2 took 11799 milliseconds...

After all, that's a pretty big difference, isn't it?

Community
  • 1
  • 1
Utku Özdemir
  • 7,390
  • 2
  • 52
  • 49
  • 2
    These type of questions can typically be answered with two points. 1) Your performance tests are unreliable for a multitude of reasons. 2) A compiler can't implement every possible optimization. – Jeroen Vannevel Mar 03 '16 at 20:25
  • And in Java's case, the compiler implements essentially _no_ optimisation beyond constant folding. – Chris Kitching Mar 03 '16 at 20:28
  • @JeroenVannevel 1) My performance tests might not be reliable, but my question stands. 2) Why? This is not a very complex optimization. What's the reason behind not optimizing it? – Utku Özdemir Mar 03 '16 at 20:28
  • 1
    @VGR I've seen many codes like: `someString.split("\\s*,\\s*")`, and this is called many times from many places. And I'm not questioning if the code is written good or bad. Might be bad code, might be rare, but still, my question is just "why not optimize?" – Utku Özdemir Mar 03 '16 at 20:33
  • 1
    @UtkuÖzdemir - Turn the question around - why *should* the compiler authors optimise? (and add code/testing/maintenance overhead in the process) – Oliver Charlesworth Mar 03 '16 at 20:36
  • @OliverCharlesworth Because it at least does "some" optimizations, so why not this? I cannot find a single case which this kind of optimization decrease the performance. And that's why I'm asking. By the way, I don't ask only about the compiler. Some optimizations can be done on runtime, on JVM also. I'm asking in general, on any level. – Utku Özdemir Mar 03 '16 at 20:41

2 Answers2

4

It is not the compiler's responsibility to improve your use of the API.

The compiler couples to a minimal subset of the huge number of standard Java classes. Some of these are listed in the Java Language Specification, section 1.4: Relationship to Predefined Classes and Interfaces:

As noted above, this specification often refers to classes of the Java SE platform API. In particular, some classes have a special relationship with the Java programming language. Examples include classes such as Object, Class, ClassLoader, String, Thread, and the classes and interfaces in package java.lang.reflect, among others. This specification constrains the behavior of such classes and interfaces, but does not provide a complete specification for them. The reader is referred to the Java SE platform API documentation.

Satisfying your request with a special case for String.split() would:

  • Require coupling the compiler to the Pattern class.
  • Require coupling the compiler to a particular implementation of String.split(), which might differ from the library used at run-time.
  • Raise similar questions about potentially many other API methods.

Satisfying your request with a general approach would be more difficult. Java is not a functional language. Method calls are not necessarily referentially transparent -- multiple calls to the same method with the same arguments may return different results, and have side effects. In your case, Pattern.compile() returns different results on every call. Automatic memoization is easier in functional languages.

As you note, you have a way to use the API efficiently - compile once, split many, rather than using the convenience method String.split(). Increasing the complexity and coupling of the compiler for this special case (and others) carries a cost that outweighs the benefits.

Andy Thomas
  • 84,978
  • 11
  • 107
  • 151
  • Thank you for the answer. But I'm also asking not only why compiler doesn't do it, I'm asking in more general, why it does not get optimized at some point: either on compilation or on runtime, by JVM for instance. – Utku Özdemir Mar 03 '16 at 21:10
  • Java is not a functional language. A method call with the same arguments does not always return an identical or equal value, and may have side effects. At the least, this particular optimization would require coupling the compiler or VM to another class, to accomplish something that can already be done. – Andy Thomas Mar 03 '16 at 21:15
  • Cannot VM just check if the `Pattern` class is loaded? It can (and should) do this kind of optimization if only some conditions are met. And yes, coupling is bad, but why desperately try to avoid coupling, when the `String` class implementation itself (on that specific implementation of the Language Spec) depends on (therefore coupled with) the `Pattern` class? – Utku Özdemir Mar 03 '16 at 21:32
  • Minimizing coupling to the API minimizes costs and risks. Yes, the VM could couple to the String implementation and the Pattern class and expectations about their behavior. It could replace convenience calls with efficient usage in this special case and others. **But the VM has enough to do.** The API supports a convenience method -- but you don't have to use it. If you're concerned about performance, it's easy enough to use the Pattern API directly. – Andy Thomas Mar 03 '16 at 23:29
0

The Java compiler just doesn't do optimisation. It does constant folding and it will delete branches that have conditions that reduce to false under constant folding, but it doesn't do anything more substantial. This is often surprising to people who've come from C/C++: Java is a language where source-level microoptimisations aren't always completely useless (though the JIT does make them usually so)

In some situations, it does comically terrible things. For instance, this code fragment:

Integer X = 3;
X++;

Ends up in bytecode as:

0: iconst_3            // Push 3
1: invokestatic #2     // Integer.valueOf(3)
4: astore_1            // Store the Integer in x.
5: aload_1             // Push x....
6: astore_2            // ... Store x in y.
7: aload_1             // Push x again...
8: invokevirtual #3    // x.intValue()
11: iconst_1           // Push 1
12: iadd               // x.intValue() + 1
13: invokestatic #2    // Integer . valueOf ( x . intValue () + 1)
16: dup                // Duplicate the new Integer .
17: astore_1           // Store it in x.
18: astore_3           // ... And also in z
19: aload_2            // Put y on the stack ...
20: pop                // .. discard it .

... Which is clearly ridiculous.

Similarly, each string append operation compiles down to a separate StringBuilder use.

The claim is that the Just-in-time compiler can do all the optimisation. That doesn't always work brilliantly: it's no use for binary size, some platforms have poor JIT, and the JIT can't optimise very aggressively anyway because it's got a very short amount of time to work with.

To make matters worse, excessive bytecode bloat can cause the JIT to be more reluctant to process your function, and harm the quality of the eventual output. Eeep.

There do exist optimisers for Java bytecode, such as Proguard, though I don't think any handles your specific case. There is probably room for a useful source-level compile time optimiser for Java, but none currently exists. Well, unless you count the one I wrote at uni (it's on my GitHub if you're super keen. You can use it as a platform for implementing optimisations like the one you want. PRs welcome :P).

Essentially, if you want ahead-of-time optimisation, Java is not the language to use. In practice, on platforms where the JIT is good, things work out alright, though seemingly AoT optimisation could make them somewhat better. On platforms like old Android where JIT is suckful or nonexistent, this flaw in Java can be quite harmful (that's basically why Proguard exists)

Chris Kitching
  • 2,559
  • 23
  • 37