Concurrent byte array access in Java with as few locks as possible

Question

I'm trying to reduce the memory usage for the lock objects of segmented data. See my questions here and here. Or just assume you have a byte array and every 16 bytes can (de)serialize into an object. Let us call this a "row" with row length of 16 bytes. Now if you modify such a row from a writer thread and read from multiple threads you need locking. And if you have a byte array size of 1MB (1024*1024) this means 65536 rows and the same number of locks.

This is a bit too much, also that I need much larger byte arrays, and I would like to reduce it to something roughly proportional to the number of threads. My idea was to create a

ConcurrentHashMap<Integer, LockHelper> concurrentMap;

where Integer is the row index and before a thread 'enters' a row it puts a lock object in this map (got this idea from this answer). But no matter what I think through I cannot find an approach that is really thread-safe:

// somewhere else where we need to write or read the row
LockHelper lock1 = new LockHelper();
LockHelper lock = concurrentMap.putIfAbsent(rowIndex, lock1);
lock.addWaitingThread(); // is too late
synchronized(lock) {
  try { 
      // read or write row at rowIndex e.g. writing like
      bytes[rowIndex/16] = 1;
      bytes[rowIndex/16 + 1] = 2;
      // ...
  } finally {
     if(lock.noThreadsWaiting())
        concurrentMap.remove(rowIndex);
  }
}

Do you see a possibility to make this thread-safe?

I have the feeling that this will look very similar like the concurrentMap.compute contstruct (e.g. see this answer) or could I even utilize this method?

map.compute(rowIndex, (key, value) -> {
    if(value == null)
       value = new Object();
    synchronized (value) {
        // access row
        return value;
    }
});
map.remove(rowIndex);

Is the value and the 'synchronized' necessary at all as we already know the compute operation is atomically?

// null is forbidden so use the key also as the value to avoid creating additional objects
ConcurrentHashMap<Integer, Integer> map = ...;

// now the row access looks really simple:
map.compute(rowIndex, (key, value) -> {
    // access row
    return key;
});
map.remove(rowIndex);

BTW: Since when we have this compute in Java. Since 1.8? Cannot find this in the JavaDocs

Update: I found a very similar question here with userIds instead rowIndices, note that the question contains an example with several problems like missing final, calling lock inside the try-finally-clause and lack of shrinking the map. Also there seems to be a library JKeyLockManager for this purpose but I don't think it is thread-safe.

Update 2: The solution seem to be really simple as Nicolas Filotto pointed out how to avoid the removal:

map.compute(rowIndex, (key, value) -> {
    // access row
    return null;
});

So this is really less memory intense BUT the simple segment locking with synchronized is at least 50% faster in my scenario.

Maybe you could use an arrangement similar to n-way-associative caches? Two-way-associative would mean to have one lock for the even-address rows and one for the odd-address rows. Depending on the number of threads you could experiment with the number of locks. If there is a pattern in the accesses it may help to use a prime number of locks or even a primitive hash to map row addresses to the matching lock object. — Georg Bisseling, Sep 24 '16 at 10:05
Striping assumes a good distribution, but suffers from contention and the locks thrash due to performing tiny amounts of work. Instead consider a message passing approach to have a single writer (or segment for one per core). You can then use a disruptor or flat-combining approach, penalizing a call by performing a batch of work instead of suffering from contention. — Ben Manes, Sep 24 '16 at 21:19
@GeorgBisseling that was what we had already tested, see the update for the sources. — Karussell, Sep 27 '16 at 10:10
@BenManes Do you have some links explaining your approach in more detail? In my case I can assume that I have just one writer so this would be interesting. — Karussell, Sep 27 '16 at 10:12
@Karussell The techniques work best if you don't need the requesting thread to wait for the response. See [disruptor](http://martinfowler.com/articles/lmax.html), [flat combining](https://www.cs.bgu.ac.il/~hendlerd/papers/flat-combining.pdf), and [caffeine](http://highscalability.com/blog/2016/1/25/design-of-a-modern-cache.html) for different variations on the idea of record => batch => lock => replay. — Ben Manes, Sep 27 '16 at 19:01
Thanks really interesting reads, especially the flat combining — Karussell, Sep 29 '16 at 16:23

Nicolas Filotto · Accepted Answer · 2016-09-24T16:36:23.393

1

Is the value and the synchronized necessary at all as we already know the compute operation is atomically?

I confirm that it is not needed to add a synchronized block in this case as the compute method is done atomically as stated in the Javadoc of ConcurrentHashMap#compute(K key, BiFunction<? super K,? super V,? extends V> remappingFunction) that has been added with BiFunction since Java 8, I quote:

Attempts to compute a mapping for the specified key and its current mapped value (or null if there is no current mapping). The entire method invocation is performed atomically. Some attempted update operations on this map by other threads may be blocked while computation is in progress, so the computation should be short and simple, and must not attempt to update any other mappings of this Map.

What you try to achieve with the compute method could be totally atomic if you make your BiFunction always returns null to remove the key atomically too such that everything will be done atomically.

map.compute(
    rowIndex, 
    (key, value) -> {
        // access row here
        return null;
    }
);

This way you will then fully rely on the locking mechanism of a ConcurrentHashMap to synchronize your accesses to your rows.

edited Sep 24 '16 at 16:36

answered Sep 24 '16 at 12:07

Nicolas Filotto

43,537
11
94
122

Ah, so `null` can be used to avoid storing. Thanks for pointing this out! Regarding the improvement: the ConcurrentHashMap just 'stores' locks proportionally to the number of threads and so the array is probably in the range of a few dozen entries, whereas my method needs one lock per row and will be several million (or a bit less if range-wise) but always a lot more than with this approach. – Karussell Sep 24 '16 at 13:45
with a `ConcurrentHashMap`, you will have one lock per segment which covers a chunk of hash codes which also means a chunk of keys or more precisely a chunk of rows in your case you won't have one lock per row as you seem to expect. By default you will have 16 segments, if you need more segments you will need to construct your `ConcurrentHashMap` with a higher `concurrencyLevel` – Nicolas Filotto Sep 24 '16 at 14:01
Ah, understood and using the concurrencyLevel is exactly what I need and could set it to the number of threads. Plus far more memory efficient than my solution. – Karussell Sep 24 '16 at 16:22
Ah it is more memory efficient interesting I would have bet for the opposite, anyway good news – Nicolas Filotto Sep 24 '16 at 16:25
Hmmh it looks like `concurrencyLevel` is not really used from the contructor. Strange: `if (initialCapacity < concurrencyLevel) initialCapacity = concurrencyLevel;` – Karussell Sep 24 '16 at 16:27
Hum indeed they totally changed the implementation between Java 7 and 8, I did not even noticed, there is no segments anymore (only for serialization for compatibility reason) now the `concurrencyLevel` is used as a sizing hint – Nicolas Filotto Sep 24 '16 at 16:36
Implemented this with my scenario and memory usage is nice but almost 2 times slower than the 'segmented' approach with simple `synchronized(object)` https://github.com/karussell/concurrent-test/blob/master/src/main/java/com/graphhopper/concurrenttest/ConcurrentByteArrayCHM.java – Karussell Sep 24 '16 at 18:03
1

I guess that it is because you pay the price of the overhead added by the fact that you don't execute your code directly but through a lambda expression. But still for me your OL is the best for sure, usually it is done with modulus on the key get the right lock but the idea is the same – Nicolas Filotto Sep 24 '16 at 18:17

Concurrent byte array access in Java with as few locks as possible

1 Answers1

Linked