Understanding JVM Behavior: How Does `switch` Handle String Hash Collisions?

Question

During an interview, I was posed an intriguing question about how the Java Virtual Machine (JVM) handles a switch statement involving a String type. Specifically, I was asked whether the JVM employs LookupSwitch or TableSwitch for this operation. My initial thought was that it would utilize LookupSwitch due to the sparsity of String hashcodes. However, the interviewer added a layer of complexity by asking: "How does the JVM handle hash collisions when two different cases have the same hashcode?"

class Solution {
    static int switchString(String i) {
        int res;
        switch (i) {
            case "A":  // hashcode 65
                res = 0;
                break;
            case "FB":  // hashcode 2236
                res = 1;
                break;
            case "Ea":  // hashcode 2236
                res = 2;
                break;
            default:
                res = 3;
        }
        return res;
    }

    public static void main(String[] args) {
        System.out.println(switchString("Ea"));
    }
}

Can you help clarify how the JVM navigates through switch cases involving String objects, especially when hash collisions are involved?

For the first part question, I found I'm wrong here. the compiler has to pick either, a lookupswitch or tableswitch instruction

@LouisWasserman a hedge fund firm, but due to a non-disclosure agreement, I'm unable to reveal the company's name. — maplemaple, Sep 01 '23 at 19:32
if you really need to know, use `javap` to disassemble the class file generated by that code (the compiler on my system uses `lookupswitch`, and then compares to the literals, assigning an index to a temporary; and then uses `tableswitch` to do the corresponding assignment to `res`) — user22471702, Sep 01 '23 at 22:19
(and a different compiler may do something else, or maybe this get optimized in future versions .... at the end it is some implementation detail and almost irrelevant for coding {unless when working directly with bytecode}) — user22471702, Sep 01 '23 at 22:37
Unless you are interviewing for a Java compiler or JDK library position this is an irrelevant question. You could answer along the lines of 'I don't expect that issue to ever come up in the position I am interviewing for.' But there's not much you can do about smart-alec interviewers except move on. — user207421, Sep 02 '23 at 01:43

score 1 · Answer 1 · answered Sep 01 '23 at 22:12

We can reach some conclusions without particularly looking:

tableswitch is used when the possible values of the expression being switched on are fairly clustered; lookupswitch otherwise. Most hash codes are going to be well spread; if tableswitch gets used at all it'll probably be rare.
It was never the case that we could use the hashcodes alone to accomplish the switching; we would always have to use .equals anyway to check that we've actually got the right string. So presumably that's how we'll resolve hash collisions, by comparing them. (Conceivably we could fall back to a binary search thing if there were a ton of hash collisions, but for that to be a realistic scenario, the code has to be written adversarially.

score 0 · Answer 2 · answered Sep 01 '23 at 18:24

0

I'm not going to claim being an expert but I concur with your assessment on the LookupSwitch - Regarding hash collision I would imagine that's resolved with the equals() method.

answered Sep 01 '23 at 18:24

Eric

154
6

Understanding JVM Behavior: How Does `switch` Handle String Hash Collisions?

2 Answers2