HashMap vs Switch statement performance

Question

A HashMap essentially has O(1) performance while a switch state can have either O(1) or O(log(n)) depending on if the compiler uses a tableswitch or lookup switch.

Understandably, if a switch statement is written as such,

switch (int) {
    case 1:
    case 2:
    case 3:
    case 4:
    default:
}

then it would use a tableswitch and clearly have a performance advantage over a standard HashMap. But what if the switch statement is sparse? These would be two examples that I would be comparing:

HashMap<Integer, String> example = new HashMap<Integer, String>() {{
        put(1, "a");
        put(10, "b");
        put(100, "c");
        put(1000, "d");
}};

.

switch (int) {
    case 1:
        return "a";
    case 10:
        return "b";
    case 100:
        return "c";
    case 1000:
        return "d";
    default:
        return null;
}

What would provide more throughput, a lookupswitch or HashMap? Does the overhead of the HashMap give the lookupswitch an advantage early but eventually tapers off as the number of cases/entries increase?

Edit: I tried some benchmarks using JMH, here are my results and code used. https://gist.github.com/mooman219/bebbdc047889c7cfe612 As you guys mentioned, the lookupswitch statement outperformed the HashTable. I'm still wondering why though.

As with pretty much every performance question: you must measure it. http://stackoverflow.com/q/504103/139010 I look forward to your results `:)` — Matt Ball, Jan 16 '15 at 22:32
A good default answer is to tell you to measure the difference... I would expect the switch statement to be faster, because it should amount to less instructions. With C and GCC, a switch gets implemented as if/elseif-chains, jump tables, or what not depending on the context (e.g. how many cases in the switch, indexing etc) — Morten Jensen, Jan 16 '15 at 22:32
In addition to far lesser complexity, the lookupswitch mechanism has the important advantage of locality of reference. The hashmap must 1. dereference your Integer key; 2. dereference the bucket array; 3. dereference the Node instance in the array at index derived from 1; 4. dereference the Integer key of the Node to make sure it's equal to the requested key; 5. finally dereference the return value from the Node. — Marko Topolnik, Jan 16 '15 at 22:44
@MattBall I'll start a JMH and see how it goes for different sized switches/hashmaps. — Joe C, Jan 16 '15 at 23:37
It could be interesting to run the same with a String key instead of int. — MonoThreaded, Mar 20 '15 at 07:22
The benchmark would be more interesting if you randomised your access to the map keys. If the same key is accessed many times in a row, I would imagine the JIT might rewrite the switch statement... — Lukas Eder, Dec 13 '17 at 09:18
Another consideration: since the lookup key is a primitive integer, the HashMap technique will require boxing of the key. This is highly optimized for small integers but still has overhead. I would speculate based on my experience that the switch is faster until you get 100s of cases. — Wheezil, Apr 11 '18 at 14:02
I ran a simple test using about 100 switch/map values between 0 and 999. Under java8 oracle JVM on Windows running 100M iterations: switch=280ms, map=490ms. Takeaways: switch is mostly faster, but only rarified cases will notice. — Wheezil, Apr 11 '18 at 14:34
It is the incorrect comparison - autoboxing eats up all the improvement. I tried the same test for Strings the HashMap looks a bit better even with a not too big set of the cases, but std deviation is lower for the switch. mapTest: 11.656 ± 3.711 ns/op switchTest 15.689 ± 0.314 ns/op You can easily try/tweak this project yourself: https://github.com/dryganets/switch-test I think the string length might make a difference. — Sergey Dryganets, Jun 20 '18 at 22:22
@SergeyDryganets Strings in switches in Java are more of a special case, and boils down to a slightly better if/then/else chain where the comparison function is equals(). Using a hashmap vs a switch for strings is a much harder comparison since they're using fundamentally different methods for "lookups". Length does make a difference. The JVM will likely cache the autoboxed references in the primitive case. A primitive hashmap could be used instead but I'm doubtful it'll make a large difference. — Joe C, Jun 23 '18 at 21:06
@JoeC if it is that good how would you explain that worst case testing gives a better result with hashmap? I used java9 as my runtime. For Dalivk and ART VM's on Android, I could say for sure that equals/hashcode are used in switch statement. I didn't profile it on JVM so couldn't say it for sure. Oracle docs say that more efficient algorithm is used for strings so probably you are right. As for Autoboxing, it is flaky statement. JVM does cache some of the integers by default 0-128. You can configure it with -XX:AutoBoxCacheMax. The original test's constants are not cached. — Sergey Dryganets, Jun 24 '18 at 22:10

score 65 · Answer 1 · edited May 09 '19 at 11:22

The accepted answer is wrong here.

http://java-performance.info/string-switch-implementation/

Switches will always be as fast as if not faster than hash maps. Switch statements are transformed into direct lookup tables. In the case of Integer values (ints, enums, shorts, longs) it is a direct lookup/jmp to the statement. There is no additional hashing that needs to happen. In the case of a String, it precomputes the string hash for the case statements and uses the input String's hashcode to determine where to jump. In the case of collision, it does an if/else chain. Now you might think "This is the same as HashMap, right?" But that isn't true. The hash code for the lookup is computed at compile time and it isn't reduced based on the number of elements (lower chance of collision).

Switches have O(1) lookup, not O(n). (Ok, in truth for a small number of items, switches are turned into if/else statements. This provides better code locality and avoids additional memory lookups. However, for many items, switches are changed into the lookup table I mentioned above).

You can read more about it here How does Java's switch work under the hood?

Did I say which one is faster? I didn't say that. My answer is which one is convenient in which case and why we don't need to worry about performance for the case because we have ton of other use cases we need to optimize performance rather than this simple case. — LHA, Sep 10 '18 at 16:14
You said Switches has O(1) it is NOT correct for all languages/compilers. It depends on the compiler and the worst case it is O(n). — LHA, Sep 10 '18 at 16:17

LHA · Accepted Answer · 2022-07-26T14:34:03.983

43

It depends:

If there are a few items | fixed items. Using switch if you can ( worst case O(n))
If there are a lot of items OR you want to add future items without modifying much code ---> Using hash-map ( access time is considered as constant time)
You should NOT try to improve performance for the case, because the difference in execution time is nanoseconds. Just focus on readability/maintainability of your code. Is it worth optimizing a simple case to improve a few nanoseconds?

edited Jul 26 '22 at 14:34

answered Jan 16 '15 at 22:39

LHA

9,398
8
46
85

The length of the key is also pertinent. – OldCurmudgeon Jan 16 '15 at 22:46
1

Why a lot of items are bad for switch statement ? I have heard that compiler does optimization and binary tree in switch statements for us. – KRoy Aug 30 '17 at 15:06
2

This is incorrect. If you can switch, switch. It will always be as fast as if not faster than a hashmap. For integer values and enum values, it is transformed into an in memory lookup table, which doesn't have the cost of hashing. For a string, it creates the HashMap and uses that to switch. The only thing switch might be slower than is an if/else block due to code locality being low. – Cogman Sep 06 '18 at 20:36
We have ton of other use cases we need to optimize performance rather than this simple case. No need to optimize a simple case to get a few nanoseconds. – LHA Sep 10 '18 at 16:20
5

I really dislike it when people say things like "you should not worry about performance". Yes, in this case it might be perfectly valid, but as a general whole, our industry is producing *THOUSANDS* of software packages at the moment with horrible, unacceptable performance (given the hardware that we currently use). The problem is really not that people think _too much_ about performance, quite the opposite. Even _if_ some people would think too much about it, well, it's definitely much less off a problem than the "YAGNI" approach when applied to performance optimization. :/ – Per Lundberg Mar 22 '19 at 14:13
2

Can’t hear this „don’t worry about performance“ any more. This is the reason, why people claim Java to be slow. In today’s applications, it is not unrealistic to have billions (and more) of events per second. Especially for developing libraries it is thus extremely important to care about performance. So yes, also nanoseconds can be important. The question want about code style, so please just answer the question ... if you can. – PeMa Jun 20 '20 at 09:58
2

@PerLundberg: Did you see I said "For your case"? - It means in this case, the question owner should not worry about performance. I DID NOT SAY "you should not worry about performance" in all cases. We always need to care about performance. We need to optimize our API for performance too BUT NOT ALL SINGLE SOURCE CODE LINE OPTIMIZATION? Because Readability/Maintainability is also important too. Sometime we need to have some kind of trade off. – LHA Jun 21 '20 at 00:59
@Loc I don't think this is a valid answer. For me, it looks more like a general remark on code style. An acceptable answer would either explain or at least link to a description of differences between both choices. Or if (and in this case this might be true) there is no final answer, there should be some kind of benchmark or similar. Plus, I think, one very important question here is: Will the JIT eventually even produce the same processor instructions? – PeMa Jun 21 '20 at 04:29

Steve Cook · Answer 3 · 2021-04-25T18:15:26.073

TL/DR

Base it on code readability and maintainability. Both are cost O(1) and provide almost no difference (though switches generally will be slightly faster).

In this particular case a map would be faster, as a switch returns an address and then must go to that address to identify the return value. (A rare case example). If your switch is just calling functions anyways, a Map would also be faster.

To make things faster, I would ensure using numeric cases and avoid using strings via constants or enumerators (typescript).

(edited) I confirmed by expectation: How does Java's switch work under the hood? with switches.

More detailed answer

In the weeds:

A switch statement will usually be higher performance. It creates a lookup table and goto reference and starts at that point. However there are exceptions.

When you utilize a simple switch such as return map.get(x) vs. switch(1=>'a', 2=>'b', etc). That is because the map can directly return the value desired where the switch will stop map the addresses and continue until break or the end is met.

In any event, they should be extremely similar in execution cost.

Think about maintainability and readability of the code

Using a map decouples the data, which can gain the benefit of creating "switch" cases dynamically. More detailed below.

If there are several complex functions/processes you need to handle, it may be easier to read/write if you utilize a map instead. Especially if the switch statement starts to exceed 20 or 30 options.

Personally used case example for maps:

I have been utilize the following pattern for flux (Redux/useReducer) in React applications for some time.

I create a central map where I map the trigger as the key, and the value is a functional reference. I can then load cases where and when it makes sense.

Initially I used this to be able to break the use cases down to reduce file size and group cases of similar function together in a more organized fashion. Although I later evolved it to be loaded in domains and configured the events and data in a domain hook, like useUser, useFeedback, useProfile, etc...

Doing so allowed me to create the default state, initialization functions, events, and so forth into a logical file structure, it also allowed me to keep the footprint low until needed.

One note to keep in mind

Using a map does not allow for drop through, though most people consider this code smell anyways. At the same time it protects from accidental fall through.

score 0 · Answer 4 · edited May 18 '17 at 15:28

0

In your case, since you are using an Integer key for your HashMap and a plain 'int' for your switch statement, the best performing implementation will be the switch statement unless the number of passes through this section of code is very high (tens or hundreds of thousands).

edited May 18 '17 at 15:28

Evgeniy Mishustin

3,343
3
42
81

answered May 18 '17 at 12:51

sqlsword

52
1
1
10

John Tribe · Answer 5 · 2018-11-26T13:39:00.307

If I have that kind of example I use Guava ImmutableMaps (sure You can use java 9 builder as well).

private static final Map<String, String> EXAMPLE = ImmutableMaps.<String, String>>builder()
    .put("a", "100")
    .put("b", "200")
    .build();

That way they are immutable and initated only once.

Sometimes I use strategy pattern that way:

private static final Map<String, Command> EXAMPLE = ImmutableMaps.<String, String>>builder()
    .put("a", new SomethingCool())
    .put("b", new BCool())
    .build(); 

private static final Command DEFAULT= new DefCommand();

Use:

EXAMPLE.getOrDefault("a", DEFAULT).execute(); //java 8

About performance just pick readability. You will thank me later (1 year later) :D.

HashMap vs Switch statement performance

5 Answers5

TL/DR

More detailed answer

In the weeds:

Think about maintainability and readability of the code

Linked