Faster way (performance) to check if a letter is uppercase?

Question

If you needed to check if a letter is uppercase, you could use Character.isUpperCase(char c) in Java, which is very straightforward and simple.

You could also check the the unicode value if its in the range [65, 90].

I'm sure its almost insignificant, but just like how switch statements will run quicker than if statements, would using the unicode check be quicker than the method call?

What do you think `Character.isUpperCase(char)` does internally? — Elliott Frisch, Oct 24 '16 at 00:54
"Switch statements will run quicker than if statements"?! Can you post that benchmark? — Kerrek SB, Oct 24 '16 at 00:54
“You could also check the the unicode value if its in the range [65, 90].” … assuming your input text will not contain `é`. Apologies for the cliché example. — VGR, Oct 24 '16 at 00:57
If you know it is ASCII and you know it is letters, then you can just check a single bit. Those ASCII guys were pretty clever ;-) — John3136, Oct 24 '16 at 01:00
write your code clearly and as idiomatically as you can, and only worry about performance when you prove to yourself there is a problem — MeBigFatGuy, Oct 24 '16 at 01:25

score 2 · Accepted Answer · edited Jun 20 '20 at 09:12

First of all, you are not comparing like with like.

The Character.isUpperCase(char) method tests to see if a character is any Unicode uppercase character. An if test like:

   if (ch >= 65 && ch <= 90)

just tests if the character is a (7-bit) ASCII uppercase character. The latter is probably is faster because it is a simpler test. But is could also be the WRONG test.

Under the hood, the isUpperCase code is complicated because it has to work for all Unicode code planes and work efficiently for the common case (LATIN-1 characters). It is doing some rather clever things to achieve that, but in some cases it does use a switch.

I'm sure its almost insignificant, but just like how switch statements will run quicker than if statements, would using the unicode check be quicker than the method call?

It probably is insignificant ... in the context of a complete application. And the standard advice is to benchmark and profile your real application before you attempt to optimize your code at this level.

But to answer your question, a switch statement in Java will compiled by the JIT compiler to either a branch table or a sequence of if ... else if ... tests. The decision of which to use is a trade-off of speed versus code space, and it will depend on the number and sparseness of the switch arms. I don't know this, but doubt that the JIT compiler does the optimization in the other direction; i.e. from the bytecodes for a sequence of if ... else if ... tests to a branch table.

UPDATE: In fact, the bytecode instruction set provides two ways to code a switch statement (for an integer target); see JVM spec 3.10. See also Difference between JVM's LookupSwitch and TableSwitch?. So, in fact, some of the decision making may be happening in the bytecode compiler rather than the JIT compiler.

UPDATE 2: However, I then found this in a mailing list from John Rose.

The C2 JIT reorganizes lookupswitch and tableswitch instructions from scratch, using its own notions of what is efficient. You end up with a decision tree and/or some PC jump blocks, but you can end up with a mix of both from either instruction.

The C1 JIT reorganizes the instructions also, detecting key ranges (runs of common branch targets) and handling them with 1-2 comparisons per range. Oddly, it does not bother to put a decision tree on top of this, nor does it attempt jump tables.

Source: http://compiler-dev.openjdk.java.narkive.com/dg9XUG39/compiling-large-switch-statements

At any rate, to give the JIT compiler the best chance of achieving the fastest code, it is probably better to use a switch statement. And certainly, the intent of the code will be clearer if you use a switch.

But to reiterate, comparing simple if test against something as complicated as isUpperCase is not a fair (or particularly meaningful) comparison. I would expect the if version to be faster because it is doing something much simpler.

There are other possible implementations. The Watcom C compiler used to generate a binary search for large sparse case sets. — user207421, Oct 24 '16 at 02:19
@EJP - I didn't know that. And thinking outside the box, there are other strategies too ... up to and including trying to find a "perfect hash function". But I doubt that the JIT compiler would do these things. For a start, memory is a lot cheaper these days. — Stephen C, Oct 24 '16 at 02:28

Faster way (performance) to check if a letter is uppercase?

1 Answers1