61

I am looking for a high-performance, concurrent, MultiMap. I have searched everywhere but I simply cannot find a solution that uses the same approach as ConcurrentHashMap (Only locking a segment of the hash array).

The multimap will be both read, added to and removed from often.

The multimap key will be a String and it's value will be arbitrary.

I need O(1) to find all values for a given key, O(N) is OK for removal, but O(logN) would be preferred.

It is crucial that removal of the last value for a given key will remove the container of values from the key, as to not leak memory.

EDIT: HERE'S THE SOLUTION I BUILT, available under ApacheV2: Index (multimap)

Viktor Klang
  • 26,479
  • 7
  • 51
  • 68
  • 7
    There's no Map with O(1) lookup (except the bucket algorithm stuff, as usual). HashMaps have O(cn) with very small c. – ziggystar Sep 03 '10 at 12:43
  • 2
    ziggystar, it depends on how well the hashing function distributes the keys. If it does so randomly - which you could expect of a modern hash, for arbitrary strings - then lookup is O(1). This is also what the HashMap Javadoc says. – Thomas Kappler Sep 03 '10 at 12:59
  • Ziggy: I'll settle for an O(cn) version – Viktor Klang Sep 03 '10 at 13:48
  • Perhaps some more detail about the problem you are trying to solve would be helpful in providing a reasonable solution. – Andrew Sep 03 '10 at 13:54
  • 2
    I have a registry of potentially millions of objects and they may share some properties, and I need an index for said properties, so I can retrieve all objects having a certain property. – Viktor Klang Sep 03 '10 at 14:02
  • How many properties might an object have? How often do objects come and go? How often to properties come and go? Are there a fixed number or properties, or a reasonable upper bound? – Andrew Sep 03 '10 at 14:13
  • Currently there's only 1 property that is needed, but others might come along later. Object come and go very often, some live long, some die young. Since the property is a String, it can be virtually anything. – Viktor Klang Sep 03 '10 at 14:15
  • @Viktor Klang Aside from copy and pasting CCHM and altering the Segment I am curious to know if you find a resolution for this. – John Vint Sep 03 '10 at 17:58
  • 1
    Perhaps it is time to search for your answer on cstheory.stackexchange.com? It looks like you'll be rolling your own data structure... – Daniel C. Sobral Sep 03 '10 at 18:46
  • 2
    @John V: Here's what I built: http://gist.github.com/566793 – Viktor Klang Sep 06 '10 at 13:16
  • @Thomas Kappler You can only have no collisions as long as you don't exceed the capacity of the HashMap. So basically the question about complexity boils down to wether you know in advance how many elements you want to store. – ziggystar Dec 12 '11 at 09:31
  • @ViktorKlang, have you found any solution for Java? I am currently looking for one and came upon this SO thread. – João Rebelo Dec 21 '16 at 13:56
  • @JoãoRebelo It should be rather straightforward to port the Scala implementation to Java. – Viktor Klang Dec 21 '16 at 15:48
  • @ViktorKlang what do you think about the following: https://gist.github.com/adamw/78a3a73c3ac9a63a25f9f3191103f9b4 - that is, using `compute`? The javadocs say that the operation is "atomic", of course the question is what does it exactly mean - is it only that the operation is invoked exactly once, or also that there are no concurrent updates on the same key? But as far as I understand the impl I think it should work – adamw Jan 16 '18 at 12:16
  • 1
    @adamw yes, I believe that could work, perhaps even cheaper: https://gist.github.com/adamw/78a3a73c3ac9a63a25f9f3191103f9b4#gistcomment-2323624 – Viktor Klang Jan 17 '18 at 12:01

10 Answers10

11

Why not wrap ConcurrentHashMap[T,ConcurrentLinkedQueue[U]] with some nice Scala-like methods (e.g. implicit conversion to Iterable or whatever it is that you need, and an update method)?

Rex Kerr
  • 166,841
  • 26
  • 322
  • 407
  • 2
    @Viktor - If you have the (key,value) pair, `map.get(key).remove(value)` should do the trick as long as you leave the empty queue there as a placeholder when you remove a key and catch the exception if it's not there (including the NPE off of get). If you can't leave the queue as a placeholder, then it's difficult to ensure safety unless you do lock the entire map while you clean out the cruft. – Rex Kerr Sep 03 '10 at 13:42
  • Can't leave the queue since it will leak memory. I'm actually considering using the ConcurrentHashMap source and bend it to my will, but it seems like a very crude approach. – Viktor Klang Sep 03 '10 at 13:45
  • 1
    This may be the best solution, you can constrain locking the entire collection to only occur when you find a Queue which is empty, I'm not sure you can get away from either writing your own implementation, or changing the way you want to use this to handle this nested structure rather than a Multimap. – Jon Freedman Sep 03 '10 at 13:49
  • 3
    Can you tolerate locking the queue when you get it, and then holding the queue while you send an update to the map to empty that (k,v) pair? That is, use the lock on the queue rather than on the whole map to avoid collisions? (And add logic to handle the case where you get a queue but then, before you can lock it, it's emptied out and removed from the map--all you have to do is check when you get the lock that it is non-empty. If it is empty, assume it's been removed.) – Rex Kerr Sep 03 '10 at 13:50
  • Good idea Rex, I'll try that approach and see if it works out. – Viktor Klang Sep 03 '10 at 14:00
  • 2
    For those looking for Viktor's implementation, you can find it as `akka.util.ConcurrentMultiMap`. Thanks a lot, Viktor! – Jean-Philippe Pellet Aug 09 '16 at 12:10
7

Have you tried Google Collections? They have various Multimap implementations.

Matt
  • 74,352
  • 26
  • 153
  • 180
Jon Freedman
  • 9,469
  • 4
  • 39
  • 58
  • 6
    Yes, but I am not looking for just a multimap, I am looking for a high-performance _concurrent_ multimap. When I checked their source code earlier today, their concurrent multimap locked the entire map, which creates a serial structure. – Viktor Klang Sep 03 '10 at 11:45
  • 1
    You're right about the implementation of Syncronized - it locks the whole instance - have you identified this collection as a performance bottlekneck? – Jon Freedman Sep 03 '10 at 12:43
  • 1
    Yes, this is as central as it gets. – Viktor Klang Sep 03 '10 at 14:00
3

There is one in akka although I haven't used it.

lisak
  • 21,611
  • 40
  • 152
  • 243
2

I had a requirement where I had to have a Map<Comparable, Set<Comparable>> where insertion on the Map be concurrent and also on the corresponding Set, but once a Key was consumed from the Map, it had to be deleted, think if as a Job running every two seconds which is consuming the whole Set<Comparable> from an specific Key but insertion be totally concurrent so that most values be buffered when the Job kicks in, here is my implementation:

Note: I use Guava's helper class Maps to create the concurrent Maps, also, this solution emulates Java concurrency in Practice Listing 5.19:

import com.google.common.collect.MapMaker;
import com.google.common.collect.Sets;

import java.util.Collection;
import java.util.Set;
import java.util.concurrent.ConcurrentMap;

/**
 * A general purpose Multimap implementation for delayed processing and concurrent insertion/deletes.
 *
 * @param <K> A comparable Key
 * @param <V> A comparable Value
 */
public class ConcurrentMultiMap<K extends Comparable, V extends Comparable>
{
  private final int size;
  private final ConcurrentMap<K, Set<V>> cache;
  private final ConcurrentMap<K, Object> locks;

  public ConcurrentMultiMap()
  {
    this(32, 2);
  }

  public ConcurrentMultiMap(final int concurrencyLevel)
  {
    this(concurrencyLevel, 2);
  }

  public ConcurrentMultiMap(final int concurrencyLevel, final int factor)
  {
    size=concurrencyLevel * factor;
    cache=new MapMaker().concurrencyLevel(concurrencyLevel).initialCapacity(concurrencyLevel).makeMap();
    locks=new MapMaker().concurrencyLevel(concurrencyLevel).initialCapacity(concurrencyLevel).weakKeys().weakValues().makeMap();
  }

  private Object getLock(final K key){
    final Object object=new Object();
    Object lock=locks.putIfAbsent(key, object);
    if(lock == null){
      lock=object;
    }
    return lock;
  }

  public void put(final K key, final V value)
  {
    synchronized(getLock(key)){
      Set<V> set=cache.get(key);
      if(set == null){
        set=Sets.newHashSetWithExpectedSize(size);
        cache.put(key, set);
      }
      set.add(value);
    }
  }

  public void putAll(final K key, final Collection<V> values)
  {
    synchronized(getLock(key)){
      Set<V> set=cache.get(key);
      if(set == null){
        set=Sets.newHashSetWithExpectedSize(size);
        cache.put(key, set);
      }
      set.addAll(values);
    }
  }

  public Set<V> remove(final K key)
  {
    synchronized(getLock(key)){
      return cache.remove(key);
    }
  }

  public Set<K> getKeySet()
  {
    return cache.keySet();
  }

  public int size()
  {
    return cache.size();
  }

}
Guido Medina
  • 416
  • 4
  • 8
2

I made a ConcurrentMultiMap mixin which extends the mutable.MultiMap mixin and has a concurrent.Map[A, Set[B]] self type. It locks per key, which has O(n) space complexity, but its time complexity is pretty good, if you aren't particularly write-heavy.

nnythm
  • 3,280
  • 4
  • 26
  • 36
1

you should give ctries a try. here is the pdf.

Shlomi
  • 4,708
  • 1
  • 23
  • 32
  • Aleks s coming to visit me at the office next week, I'll talk to him then. – Viktor Klang Nov 21 '11 at 22:14
  • iam curious to hear how the talk went. did you find ctries useful? – Shlomi Feb 15 '12 at 00:44
  • Not for the multimap purpose, I talked to Bagwell and Prokopec about making an implementation that would work as a multimap, but I don't think there was enough time. Ctries are going into Scala 2.10 – Viktor Klang Feb 15 '12 at 22:01
0

I am a bit late on this topic but I think, nowadays, you can use Guava like this:

Multimaps.newSetMultimap(new ConcurrentHashMap<>(), ConcurrentHashMap::newKeySet)
Eric Aya
  • 69,473
  • 35
  • 181
  • 253
teo
  • 1,187
  • 9
  • 10
  • 3
    Does `Multimaps.newSetMultimap(...)` support concurrent updates? According to the Javadoc, "The multimap is not threadsafe when any concurrent operations update the multimap, even if map and the instances generated by factory are." cf. http://google.github.io/guava/releases/23.0/api/docs/com/google/common/collect/Multimaps.html#newSetMultimap-java.util.Map-com.google.common.base.Supplier- – breandan Aug 17 '17 at 05:19
  • so by reading the docs, and the previous answer will this work ? : Multimaps.synchronizedSetMultimap(Multimaps.newSetMultimap(new ConcurrentHashMap<>(), ConcurrentHashMap::newKeySet)); – Radu Toader Sep 23 '19 at 15:24
0

Use MultiMaps from Gauava. Multimaps.synchronizedMultimap(HashMultimap.create())

deep
  • 841
  • 7
  • 6
0

It's late for the discussion, yet...

When it comes to high performance concurrent stuff, one should be prepared to code the solution. With Concurrent the statement the Devil is in the details has a complete meaning. It's possible to implement the structure fully concurrent and lock-free.

Starting base would be the NonBlocking Hashtable http://sourceforge.net/projects/high-scale-lib/ and then depending how many values per key and how often need to add/remove some copy on write Object[] for values or an array based Set with semaphore/spin lock.

bestsss
  • 11,796
  • 3
  • 53
  • 63
-1

Have you taken a look to Javalution which is intended for Real time etc. and of course high performance.

khmarbaise
  • 92,914
  • 28
  • 189
  • 235