5

I've got a gigantic Trove map and a method that I need to call very often from multiple threads. Most of the time this method shall return true. The threads are doing heavy number crunching and I noticed that there was some contention due to the following method (it's just an example, my actual code is bit different):

synchronized boolean containsSpecial() {
   return troveMap.contains(key);
}

Note that it's an "append only" map: once a key is added, is stays in there forever (which is important for what comes next I think).

I noticed that by changing the above to:

boolean containsSpecial() {
    if ( troveMap.contains(key) ) {
        // most of the time (>90%) we shall pass here, dodging lock-acquisition
        return true;
    }
    synchronized (this) {
        return troveMap.contains(key);
    }
}

I get a 20% speedup on my number crunching (verified on lots of runs, running during long times etc.).

Does this optimization look correct (knowing that once a key is there it shall stay there forever)?

What is the name for this technique?

EDIT

The code that updates the map is called way less often than the containsSpecial() method and looks like this (I've synchronized the entire method):

synchronized void addSpecialKeyValue( key, value ) {
    ....
}
Ted Hopp
  • 232,168
  • 48
  • 399
  • 521
Cedric Martin
  • 5,945
  • 4
  • 34
  • 66
  • what does the code that updates the map look like? – Dmitry B. Dec 07 '11 at 17:57
  • @Dmitry Beransky: it's fully synchronized, I'll edit the question : ) – Cedric Martin Dec 07 '11 at 17:59
  • It looks really similar to the double-checked locking. In this case the checking is the operation you're after. – Kiril Dec 07 '11 at 18:02
  • 1
    If this code works correct (which depends on contains working correctly while an insert is in progress (which I doubt by the way)), your `synchronized` block is really pointless. This is because `synchronized` may ensure that the returned value is correct, but there is no gurantee it hasn't changed when the return value is used. So the only reason to synchronize this method would be if `troveMap.contains(key)` is not guranteed to work while an update is in progress, in which case you can't call it without locking. – Grizzly Dec 07 '11 at 18:06
  • This is not safe. According to Rob Eden (Trove developer) retrievals are not guaranteed to preserve the internal structure of the map. See my edited answer. – Ted Hopp Dec 07 '11 at 19:25

6 Answers6

7

This code is not correct.

Trove doesn't handle concurrent use itself; it's like java.util.HashMap in that regard. So, like HashMap, even seemingly innocent, read-only methods like containsKey() could throw a runtime exception or, worse, enter an infinite loop if another thread modifies the map concurrently. I don't know the internals of Trove, but with HashMap, rehashing when the load factor is exceeded, or removing entries can cause failures in other threads that are only reading.

If the operation takes a significant amount of time compared to lock management, using a read-write lock to eliminate the serialization bottleneck will improve performance greatly. In the class documentation for ReentrantReadWriteLock, there are "Sample usages"; you can use the second example, for RWDictionary, as a guide.

In this case, the map operations may be so fast that the locking overhead dominates. If that's the case, you'll need to profile on the target system to see whether a synchronized block or a read-write lock is faster.

Either way, the important point is that you can't safely remove all synchronization, or you'll have consistency and visibility problems.

erickson
  • 265,237
  • 58
  • 395
  • 493
  • Read-write lock would be more correct, but wouldn't it incur essentially the same locking penalty (with the exception that now multiple reads can happen at the same time?) – Mark Peters Dec 07 '11 at 18:34
  • @MarkPeters I think (from a quick look at the code) that ReentrantReadWriteLock is nonblocking for reads if the read lock is available. Should be pretty speedy. – yshavit Dec 07 '11 at 18:52
  • @erickson: +1... I wasn't aware of the potential infinite loop or runtime exception of seemingly innocent code! Oh well, time to read on the read-write locks : ) – Cedric Martin Dec 07 '11 at 21:58
  • @CedricMartin Please see my edit; it will be important to profile synchronization versus a read-write lock because Trove may be fast relative to locking. – erickson Dec 07 '11 at 22:55
  • Besides the inner workings of the particular map implementation which might be perceived in an inconsistent state, the lack of synchronization applies to the key instances as well, on which the map may invoke `equals`. This incorrect code does not only require the keys to be thread safe, but also immune to unsafe publication, which may be true for immutable classes, but often fails with classes using locks or synchronized, as they usually do not guard their constructor. – Holger Aug 15 '19 at 15:35
6

It's called wrong locking ;-) Actually, it is some variant of the double-checked locking approach. And the original version of that approach is just plain wrong in Java.

Java threads are allowed to keep private copies of variables in their local memory (think: core-local cache of a multi-core machine). Any Java implementation is allowed to never write changes back into the global memory unless some synchronization happens.

So, it is very well possible that one of your threads has a local memory in which troveMap.contains(key) evaluates to true. Therefore, it never synchronizes and it never gets the updated memory.

Additionally, what happens when contains() sees a inconsistent memory of the troveMap data structure?

Lookup the Java memory model for the details. Or have a look at this book: Java Concurrency in Practice.

Mark Peters
  • 80,126
  • 17
  • 159
  • 190
jmg
  • 7,308
  • 1
  • 18
  • 22
  • 1
    but my case is special... Once true, it shall always stay true. Are you sure the code is broken here? In addition to that the write/update are fully synchronized. Once a key is in there, it stays for good. My idea was that either I'd read true and know the key is in there for sure (once it's there, it cannot leave) **or** the key ain't there, so I acquire a full lock and see the updated value if there's been a write (write being fully synchronized). – Cedric Martin Dec 07 '11 at 18:04
  • 1
    First, the double-checked locking problem is a thing of the past (not that you don't have to worry about it; it is just easy to avoid now). Second, he's not reassigning `troveMap` and it's append-only. How would it fail, even if there were thread-local copies? – Mark Peters Dec 07 '11 at 18:05
  • 1
    @CedricMartin: Of what type is your troveMap? – jmg Dec 07 '11 at 18:06
  • @jmg: *TObjectInt* and *TIntInt*, I did this in two different spots. Why does it matter which type the Trove map is? (I'm very interested : ) – Cedric Martin Dec 07 '11 at 18:07
  • @MarkPeters: How, is it a thing of the past? In the past people believed it to be correct. Today, we know that it requires at least a `volatile` modifier, if the field in question is primitive. But there is no such thing as a `volatile` for a whole object structure, i.e. that what troveMap points to. – jmg Dec 07 '11 at 18:08
  • This is not DCL, so I don't think the scenario you are referring to can happen. That's why I asked for the other piece of the code. Additions into the map are done under a synchronized function, so the new state is being copied back into global memory – Dmitry B. Dec 07 '11 at 18:08
  • In your answer, you say *therefore, it never synchronizes and it never gets the updated memory*. Isn't that categorically false in @Cedric's example? If the first lookup says the key doesn't exist, it immediately reaches a synchronization point and checks again. How does that not make it safe? – Mark Peters Dec 07 '11 at 18:09
  • @jmg: You don't *need* volatile since it's not being reassigned. – Mark Peters Dec 07 '11 at 18:10
  • @DmitryBeransky: But is it copied back from the global memory into the local memory of the reader? – jmg Dec 07 '11 at 18:10
  • @MarkPeters: But surely reassigments happen when inserting objects. It is true that every insertion is going to be fine. But readers which never synchronize might never see updates or they might see inconsistent states. – jmg Dec 07 '11 at 18:12
  • @DmitryBeransky: Exactly that is my point. *When* they enter a synchronize block, but what if they never enter a synchronize block. The OP said the contains check should almost always result in true. – jmg Dec 07 '11 at 18:14
  • 5
    @jmg: You're **completely missing the point of Cedric's code**. If the reader doesn't see the update, it immediately goes into a synchronize block which then eliminates memory copy problems. Yes, it might not see the update the first time around, but Cedric has crafted the code so that that is harmless. – Mark Peters Dec 07 '11 at 18:15
  • @jmg: they never enter the synchronize block only when the the local memory has already been updated with `true`. – Dmitry B. Dec 07 '11 at 18:17
  • 3
    @MarkPeters: Ok, then I have to ask more. Where can I find that trove map? Is it impossible that contains() return a false positive because it was running on an inconsistent view of the state of the map? – jmg Dec 07 '11 at 18:20
  • @jmg: That's a relevant point. Certain things (e.g. assignment) are atomic in Java so unless TroveMap goes fully out of its way to screw with you, it should never have a false positive. It's not likely, but it *is* a concern. – Mark Peters Dec 07 '11 at 18:22
  • @jmg Assuming it's based on a regular map, a false positive is not possible. No matter how out-of-date the map is, it can never return true if the value doesn't exist. Just because something isn't synchronized doesn't mean everything goes to hell (unless the map hasn't been initialized yet) - it just means you get an _earlier_ version. – kba Dec 07 '11 at 18:25
  • @MarkPeters: One assignment of an int is atomic in Java, true. But how many assignments happen for one addition of an element to the map? They have to be consistent as a whole, or contains() has to be written such that it works correctly after every partial execution of a JIT-compiled version of the add method. – jmg Dec 07 '11 at 18:26
  • 1
    @MarkPeters - Assignments of doubles and longs are not necessarily atomic. If the map is using long keys, OP's code is definitely not safe. – Ted Hopp Dec 07 '11 at 18:28
  • @MarkPeters: You said: a false positive is not likely. So, it is very likely when you run your program long enough on enough machines. And then it might silently corrupt your result. If you are fine with that do it. – jmg Dec 07 '11 at 18:30
  • @jmg: What I was trying to say was that it was unlikely that the implementation would allow false positives, not that it was unlikely to occur in practice. I'm for one removing my downvote, I think there was some valid confusion but the main point of what you're trying to say lives on. (Err... well I will if/when you edit your answer; vote is locked in). – Mark Peters Dec 07 '11 at 18:36
  • 2
    An append might completely change the internal structure of the map. The contains code might be iterating through entries while another thread modifies the chain of entries. SO, IMHO, it's not thread-safe. – JB Nizet Dec 07 '11 at 18:38
  • @JBNizet has it right. The big problem here isn't the false negatives or false positives, it's that you may be getting some plain-ol-fashioned race conditions that will break the internals of the map. – yshavit Dec 07 '11 at 18:42
  • @KristianAntonsen You can get a version that "never existed," (as far as the thread that made the modification is concerned) which is pretty close to everything going to hell. Operations can be reordered such that threads see things the code would seem to imply are impossible. – yshavit Dec 07 '11 at 18:55
  • @yshavit: By never existed, you mean that `troveMap` hasn't been initialized, right? – kba Dec 07 '11 at 18:58
  • @KristianAntonsen no, I mean it's in a state that it was never in according to the modifying thread. Very trivial example would be if a method did something like `this.a = 5; this.b = 3` (assuming both fields start at 0). According to that method, there was never any time in which `this.a = 0 && this.b == 3`, but without correct synchronization/barriers, another thread could see such a state. – yshavit Dec 07 '11 at 19:06
  • Just to complete my thought, imagine it's a tad more complex: `this.conditionA = isConditionA(); if (conditionA) { this.conditionB = isConditionB(); }` (assume both fields start `false`). According to this code, if `conditionB` is true, then you can assume `conditionA` is also true, and some other method may indeed make that assumption. But improper synchronization can make a thread see `conditionB && (!conditionA)`, and now you're in big trouble. – yshavit Dec 07 '11 at 19:23
  • 1
    @KristianAntonsen: yshavit is right. Such surprising states can arise due to reordering of statements by the compiler (JIT or not) and because of the complexity of cache hierarchies. – jmg Dec 07 '11 at 19:32
2

This looks unsafe to me. Specifically, the unsynchronized calls will be able to see partial updates, either due to memory visibility (a previous put not getting fully published, since you haven't told the JMM it needs to be) or due to a plain old race. Imagine if TroveMap.contains has some internal variable that it assumes won't change during the course of contains. This code lets that invariant break.

Regarding the memory visibility, the problem with that isn't false negatives (you use the synchronized double-check for that), but that trove's invariants may be violated. For instance, if they have a counter, and they require that counter == someInternalArray.length at all times, the lack of synchronization may be violating that.

My first thought was to make troveMap's reference volatile, and to re-write the reference every time you add to the map:

synchronized (this) {
    troveMap.put(key, value);
    troveMap = troveMap;
}

That way, you're setting up a memory barrier such that anyone who reads the troveMap will be guaranteed to see everything that had happened to it before its most recent assignment -- that is, its latest state. This solves the memory issues, but it doesn't solve the race conditions.

Depending on how quickly your data changes, maybe a Bloom filter could help? Or some other structure that's more optimized for certain fast paths?

yshavit
  • 42,327
  • 7
  • 87
  • 124
  • Erm, never mind on the bloom filter. I was thinking the common case is that the key is not in the map, but re-reading your comment, the common case is that it is in the map. A Bloom filter wouldn't help here. – yshavit Dec 07 '11 at 18:39
1

I think you would be better off with a ConcurrentHashMap which doesn't need explicit locking and allows concurrent reads

boolean containsSpecial() {
    return troveMap.contains(key);
}

void addSpecialKeyValue( key, value ) {
    troveMap.putIfAbsent(key,value);
}

another option is using a ReadWriteLock which allows concurrent reads but no concurrent writes

ReadWriteLock rwlock = new ReentrantReadWriteLock();
boolean containsSpecial() {
    rwlock.readLock().lock();
    try{
        return troveMap.contains(key);
    }finally{
        rwlock.readLock().release();
    }
}

void addSpecialKeyValue( key, value ) {
    rwlock.writeLock().lock();
    try{
        //...
        troveMap.put(key,value);
    }finally{
        rwlock.writeLock().release();
    }
}
ratchet freak
  • 47,288
  • 5
  • 68
  • 106
  • I use Trove because the map is too gigantic and I need a map backed by primitives. The (un)boxing / object wrappers make it too costly both memory and speedwise. – Cedric Martin Dec 07 '11 at 18:17
  • @CedricMartin check the edit for another suggestion which will likely be faster than full locking (as you have now) – ratchet freak Dec 07 '11 at 18:30
  • Trove isn't concurrent; it doesn't have `putIfAbsent()` or equivalent. – erickson Dec 07 '11 at 20:40
1

Under the conditions you describe, it's easy to imagine a map implementation for which you can get false negatives by failing to synchronize. The only way I can imagine obtaining false positives is an implementation in which key insertions are non-atomic and a partial key insertion happens to look like another key you are testing for.

You don't say what kind of map you have implemented, but the stock map implementations store keys by assigning references. According to the Java Language Specification:

Writes to and reads of references are always atomic, regardless of whether they are implemented as 32 or 64 bit values.

If your map implementation uses object references as keys, then I don't see how you can get in trouble.

EDIT

The above was written in ignorance of Trove itself. After a little research, I found the following post by Rob Eden (one of the developers of Trove) on whether Trove maps are concurrent:

Trove does not modify the internal structure on retrievals. However, this is an implementation detail not a guarantee so I can't say that it won't change in future versions.

So it seems like this approach will work for now but may not be safe at all in a future version. It may be best to use one of Trove's synchronized map classes, despite the penalty.

Ted Hopp
  • 232,168
  • 48
  • 399
  • 521
  • Even if it doesn't modify its internal structure on *retrieval,* it surely does on *insertion,* and that could happen concurrent to a retrieval. It's just not safe to use from multiple threads without external locking. – erickson Dec 07 '11 at 20:42
0

Why you reinvent the wheel? Simply use ConcurrentHashMap.putIfAbsent

korifey
  • 3,379
  • 17
  • 17
  • because, as I wrote, the map is *gigantic* and I'm using Trove because both speedwise and performance-wise it runs circles around the default maps (which are backed by amazingly innefficient Object/wrappers constantly (un)boxing). – Cedric Martin Dec 07 '11 at 18:09
  • Sorry for my fast judgement. Does addSpecialKeyValue( key, value ) invokes containsSpecial() again? – korifey Dec 07 '11 at 18:13