1

I have the following code snippet that I'm trying to see if it can crash/misbehave at some point. The HashMap is being called from multiple threads in which put is inside a synchronized block and get is not. Is there any issue with this code? If so, what modification I need to make to see that happens given that I only use put and get this way, and there is no putAll, clear or any operations involved.

import java.util.HashMap;
import java.util.Map;

public class Main {
    Map<Integer, String> instanceMap = new HashMap<>();

    public static void main(String[] args) {
        System.out.println("Hello");

        Main main = new Main();

        Thread thread1 = new Thread("Thread 1"){
            public void run(){
                System.out.println("Thread 1 running");
                for (int i = 0; i <= 100; i++) {
                    System.out.println("Thread 1 " + i + "-" + main.getVal(i));
                }
            }
        };
        thread1.start();


        Thread thread2 = new Thread("Thread 2"){
            public void run(){
                System.out.println("Thread 2 running");
                for (int i = 0; i <= 100; i++) {
                    System.out.println("Thread 2 " + i + "-" + main.getVal(i));
                }
            }
        };
        thread2.start();
    }

    private String getVal(int key) {
        check(key);
        return instanceMap.get(key);
    }

    private void check(int key) {
        if (!instanceMap.containsKey(key)) {
            synchronized (instanceMap) {
                if (!instanceMap.containsKey(key)) {
                    // System.out.println(Thread.currentThread().getName());
                    instanceMap.put(key, "" + key);
                }
            }
        }
    }
}

What I have checked out:

Lawliet
  • 3,438
  • 2
  • 17
  • 28
  • 4
    According to the [docs](https://docs.oracle.com/javase/8/docs/api/java/util/HashMap.html#:~:text=If%20multiple%20threads%20access%20a%20hash%20map%20concurrently%2C%20and%20at%20least%20one%20of%20the%20threads%20modifies%20the%20map%20structurally%2C%20it%20must%20be%20synchronized%20externally.), `get()` must synchronize if there are concurrent `put()`s. Also note that this is not a safe use of DCL; `containsKey()` can return a stale value without synchronization. – shmosel Jan 12 '22 at 22:55
  • 1
    Lawliet, forget about technology for a second and think about practical, world application. Suppose you are subscribed to a newspaper or magazine and you are interested in reading an article. The article (for some unknown reason) is being edited by multiple people. Under that scenario, how would you know you are not reading in the middle of an edit? If no synchronization existed, you might be reading "old news". This is kind of trivial, but know think about a alarm system in a nuclear reactor with multiple temp sensors. Do you think is important for the alarm system to get the latest data? – hfontanez Jan 12 '22 at 23:00
  • @hfontanez, true. I was thinking if I have this `synchronized (instanceMap)`, it would lock the `instanceMap` from being read/written from all other threads until it gets updated. My Java knowledge is ... – Lawliet Jan 13 '22 at 00:14
  • 2
    'synchronized' does notthing more than enforce "one at a time" execution betweeen threads that **synchronize on the same object**. It cannot magically stop threads that do not synchronze. – passer-by Jan 13 '22 at 01:19

3 Answers3

3

I somewhat modified your code:

  1. removed System.out.println() from the "hot" loop, it is internally synchronized
  2. increased the number of iterations
  3. changed printing to only print when there's an unexpected value

There's much more we can do and try, but this already fails, so I stopped there. The next step would we to rewrite the whole thing to jcsctress.

And voila, as expected, sometimes this happens on my Intel MacBook Pro with Temurin 17:

Exception in thread "Thread 2" java.lang.NullPointerException: Cannot invoke "java.lang.Integer.intValue()" because the return value of "java.util.Map.get(Object)" is null
    at com.gitlab.janecekpetr.playground.Playground.getVal(Playground.java:35)
    at com.gitlab.janecekpetr.playground.Playground.lambda$0(Playground.java:21)
    at java.base/java.lang.Thread.run(Thread.java:833)

Code:

private record Val(int index, int value) {}

private static final int MAX = 100_000;
private final Map<Integer, Integer> instanceMap = new HashMap<>();

public static void main(String... args) {
    Playground main = new Playground();

    Runnable runnable = () -> {
        System.out.println(Thread.currentThread().getName() + " running");
        Val[] vals = new Val[MAX];
        for (int i = 0; i < MAX; i++) {
            vals[i] = new Val(i, main.getVal(i));
        }
        System.out.println(Stream.of(vals).filter(val -> val.index() != val.value()).toList());
    };

    Thread thread1 = new Thread(runnable, "Thread 1");
    thread1.start();

    Thread thread2 = new Thread(runnable, "Thread 2");
    thread2.start();
}

private int getVal(int key) {
    check(key);
    return instanceMap.get(key);
}

private void check(int key) {
    if (!instanceMap.containsKey(key)) {
        synchronized (instanceMap) {
            if (!instanceMap.containsKey(key)) {
                instanceMap.put(key, key);
            }
        }
    }
}
Petr Janeček
  • 37,768
  • 12
  • 121
  • 145
  • 1
    For future generations, read rzwitserloot's post as well. A great companion piece for this answer for more, in-depth information. – hfontanez Jan 13 '22 at 02:03
3

To specifically explain the excellent sleuthing work in the answer by @PetrJaneček :

Every field in java has an evil coin attached to it. Anytime any thread reads the field, it flips this coin. It is not a fair coin - it is evil. It will flip heads 10,000 times in a row if that's going to ruin your day (for example, you may have code that depends on coinflips landing a certain way, or it'll fail to work. The coin is evil: You may run into the situation that just to ruin your day, during all your extensive testing, the coin flips heads, and during the first week in production it's all heads flips. And then the big new potential customer demos your app and the coin starts flipping some tails on you).

The coinflip decides which variant of the field is used - because every thread may or may not have a local cache of that field. When you write to a field from any thread? Coin is flipped, on tails, the local cache is updated and nothing more happens. Read from any thread? Coin is flipped. On tails, the local cache is used.

That's not really what happens of course (your JVM does not actually have evil coins nor is it out to get you), but the JMM (Java Memory Model), along with the realities of modern hardware, means that this abstraction works very well: It will reliably lead to the right answer when writing concurrent code, namely, that any field that is touched by more than one thread must have guards around it, or must never change at all during the entire duration of the multi-thread access 'session'.

You can force the JVM to flip the coin the way you want, by establishing so-called Happens Before relationships. This is explicit terminology used by the JMM. If 2 lines of code have a Happens-Before relationship (one is defined as 'happening before' the other, as per the JMM's list of HB relationship establishing actions), then it is not possible (short of a bug in the JVM itself) to observe any side effect of the HA line whilst not also observing all side effects of the HB line. (That is to say: the 'happens before' line happens before the 'happens after' line as far as your code could ever tell, though it's a bit of schrodiner's cat situation. If your code doesn't actually look at these files in a way that you'd ever be able to tell, then the JVM is free to not do that. And it won't, you can rely on the evil coin being evil: If the JMM takes a 'right', there will be some combination of CPU, OS, JVM release, version, and phase of the moon that combine to use it).

A small selection of common HB/HA establishing conditions:

  • The first line inside a synchronized(lock) block is HA relative to the hitting of that block in any other thread.
  • Exiting a synchronized(lock) block is HB relative to any other thread entering any synchronized(lock) block, assuming the two locks are the same reference.
  • thread.start() is HB relative to the first line that thread will run.
  • The 'natural' HB/HA: line X is HB relative to line Y if X and Y are run by the same thread and X is 'before it' in your code. You can't write x = 5; y = x; and have y be set by a version of x that did not witness the x = 5 happening (of course, if another thread is also modifying x, all bets are off unless you have HB/HA with whatever line is doing that).
  • writes and reads to volatile establish HB/HA but you usually can't get any guarantees about which direction.

This explains the way your code may fail: The get() call establishes absolutely no HB/HA relationship with the other thread that is calling put(), and therefore the get() call may or may not use locally cached variants of the various fields that HashMap uses internally, depending on the evil coin (which is of course hitting some fields; it'll be private fields in the HashMap implementation someplace, so you don't know which ones, but HashMap obviously has long-lived state, which implies fields are involved).

So why haven't you actually managed to 'see' your code asplode like the JMM says it will? Because the coin is EVIL. You cannot rely on this line of reasoning: "I wrote some code that should fail if the synchronizations I need aren't happening the way I want. I ran it a whole bunch of times, and it never failed, therefore, apparently this code is concurrency-safe and I can use this in my production code". That is simply not ever actually reliable. That's why you need to be thinking: Evil! That coin is out to get me! Because if you don't, you may be tempted to write test code like this.

You should be scared of writing code where more than one thread interacts with the same field. You should be bending over backwards to avoid it. Use message queues. Do your chat between threads by using databases, which have much nicer primitives for this stuff (transactions and isolation levels). Rewrite the code so that it takes a bunch of params up front and then runs without interacting with other threads via fields at all, until it is all done, and it then returns a result (and then use e.g. fork/join framework to make it all work). Make your webserver performant and using all the cores simply by relying on the fact that every incoming request will be its own thread, so the only thing that needs to happen for you to use all the cores is for that many folks to hit your server at the same time. If you don't have enough requests, great! Your server isn't busy so it doesn't matter you aren't using all the cores.

If truly you decide that interacting with the same field from multiple threads is the right answer, you need to think NASA programming mars rovers on the lines that interact with those fields, because tests simply cannot be relied upon. It's not as hard as it sounds - especially if you keep the actual interacting with the relevant fields down to a minimum and keep thinking: "Have I established HB/HA"?

In this case, I think Petr figured it out correctly: System.out.println is hella slow and does various synchronizing actions. JMM is a package deal, and commutative: Once HB/HA establishes, everything the HB line changed is observable to the code in the HA line, and add in the natural rule, which means all code that follows the HA line cannot possibly observe a universe where something any line before the HB line did is not yet visible. In other words, the System.out.println statements HB/HA with each other in some order, but you can't rely on that (System.out is not specced to synchronize. But, just about every implementation does. You should not rely on implementation details, and I can trivially write you some java code that is legal, compiles, runs, and breaks no contracts, because you can set System.out with System.setOut - that does not synchronize when interacting with System.out!). The evil coin in this case took the shape of 'accidental' synchronization via intentionally unspecced behaviour of System.out.

rzwitserloot
  • 85,357
  • 5
  • 51
  • 72
  • 1
    Good Lord!!!! This was a hell of an explanation. Super necessary. This post wasn't for me, but thanks! Some of those concepts I kind of knew, but this provided a lot of information even I didn't know. – hfontanez Jan 13 '22 at 02:02
  • 1
    A system.out doesn't 'solve' the problem. It just makes it less likely to appear. While a map.put is executing and the invariants of the map are temporary violated (e.g. you updates some parts but others not), the map.get could observe these broken invariants and run into problems. Apart from that, there is no happens before edge when the map.put and get happens at the same time. Even though a print might act like fences, if map.put/map.get are accessed concurrently, you still have a data-race. – pveentjer Jan 13 '22 at 06:32
  • I still can't understand....If one of the threads calls ```instanceMap.containsKey(key)``` and returns true directly, which means that the thread has seen the inserted key, then it can definitely get the data. If ```instanceMap.containsKey(key)``` returns false, then it will try to get into the synchronized block, then there is the happen-before principle: ```An unlock on a monitor happens-before every subsequent lock on that monitor```. , which means it can also see the inserted value when invoke ```map#get```. Can you explain it? Thanks!!!!! – woxihuanxiayua Jan 13 '22 at 09:28
2

The following explanation is more in line with the terminology used in the JMM. Could be useful if you want a more solid understanding of this topic.

2 Actions are conflicting when they access the same address and there is at least 1 write.

2 Actions are concurrent when they are not ordered by a happens-before relation (there is no happens-before edge between them).

2 Actions are in data race when they are conflicting and concurrent.

When there are data races in your program, weird problems can happen like unexpected reordering of instructions, visibility problems, or atomicity problems.

So what makes up the happens-before relation. If a volatile read observes a particular volatile write, then there is a happens-before edge between the write and the read. This means that read will not only see that write, but everything that happened before that write. There are other sources of happens-before edges like the release of a monitor and subsequent acquire of the same monitor. And there is a happens-before edge between A, B when A occurs before B in the program order. Note: the happens-before relation is transitive, so if A happens-before B and B happens-before C, then A happens-before C.

In your case, you have a get/put operations which are conflicting since they access the same address(es) and there is at least 1 write.

The put/get action are concurrent, since is no happens-before edge between writing and reading because even though the write releases the monitor, the get doesn't acquire it.

Since the put/get operations are concurrent and conflicting, they are in data race.

The simplest way to fix this problem, is to execute the map.get in a synchronized block (using the same monitor). This will introduce the desired happens-before edge and makes the actions sequential instead of concurrent and as consequence, the data-race disappears.

A better-performing solution would be to make use of a ConcurrentHashMap. Instead of a single central lock, there are many locks and they can be acquired concurrently to improve scalability and performance. I'm not going to dig into the optimizations of the ConcurrentHashMap because would create confusion.

[Edit] Apart from a data-race, your code also suffers from race conditions.

pveentjer
  • 10,545
  • 3
  • 23
  • 40
  • If one of the threads calls ```instanceMap.containsKey(key)``` and returns true directly, which means that the thread has seen the inserted key, then it can definitely get the data. If ```instanceMap.containsKey(key)``` returns false, then it will try to get into the synchronized block, then there is the happen-before principle: ```An unlock on a monitor happens-before every subsequent lock on that monitor```. , which means it can also see the inserted value when invoke ```map#get```. Can you explain it? Thanks!!!!! – woxihuanxiayua Jan 13 '22 at 09:29
  • 1
    Seeing a value, doesn't mean there is a happens-before edge. E.g. imagine an IntHolder with field 'int v' and a getter/setter. If one thread calls a set(42) and later a different thread sees 42; it sees the value of the write, but it doesn't synchronize with it. So the read isn't guaranteed to see whatever happened before the write. So if containsKey returns true, it doesn't mean you can see the returned value! Imagine you publish the IntHolder in the map and a get on a different thread sees that there is a hit on contain key, it could be that it sees a partially constructed IntHolder. – pveentjer Jan 13 '22 at 09:36
  • In the code of the topic starter there are 2 flavors of problems: data races and race conditions. Both need to be fixed to make this code reliable. – pveentjer Jan 13 '22 at 09:38
  • I know a few things about memory models and concurrency, and reasoning about programs that contain (benign) data races, is very difficult. It is a lot easier to reason about programs that are correctly synchronized and this is already hard enough. – pveentjer Jan 13 '22 at 09:43
  • For ```hashmap```, ```containsKey = getNode(key) != null``` , ```get=getNode(key).value```. So there may even be a Node inserted(```getNode(key) != null``` will be true), but Node.value may not be visible when this thread call ```map#get```? – woxihuanxiayua Jan 13 '22 at 09:48
  • I would suggest opening a new question so I can answer it. It is very difficult to give a proper example in the comments window. In short: if you have a data race, you can get weird problems. Do not try to reason about programs with data races (or race conditions). Apart from that, there are also race conditions that can lead to problems. – pveentjer Jan 13 '22 at 09:54
  • This is a good post. It explains why happening (so seeing some value) before doesn't imply happens-before. https://preshing.com/20130702/the-happens-before-relation/ – pveentjer Jan 13 '22 at 10:07
  • Thanks a lot, I'll take a look first! – woxihuanxiayua Jan 13 '22 at 10:21