2

I have this Map definition :

TreeMap <String, Set<Integer>>

It may contain millions of entries, and I also need a "natural order" (that's why I've chosen a TreeMap, though I could write a Comparator if needed).

So, what I have to do in order to add an element to the map is :

  1. Check if a key already exists.
  2. If not, create a new Set and add the value.
  3. If it exists, I have to add the value to the Set

I have this implementation which works fine :

private void addToMap (String key, Integer value){
    Set<Integer> vs = dataMap.get(key);
    if (vs == null){
        vs = new TreeSet<Integer>();
        dataMap.put(key,vs);
    }
    vs.add(value);
}

But I would like to avoid searching for the key and then putting the element if it doesn't exist (it will perform a new search over the huge map).

I think I could use ConcurrentHashMap.putIfAbsent method, but then :

  1. I will not have the natural ordering of the keys (and I will need to perform a sort on the millions keys)
  2. I may have (I don't know) additional overhead because of synchronization over the ConcurrentHashMap, and in my situation my process is single threaded and it may impact performance.

Reading this post : Java map.get(key) - automatically do put(key) and return if key doesn't exist? there's an answer that talks about Guava MapMaker.makeComputingMap but looks like the method is not there anymore.

Performance is critical in this situation (as always :D), so please let me know your recommendations.

Thanks in advance.

NOTE : Thanks a lot for so many helping answers in just some minutes. (I don't know which one to select as the best).

I will do some performance tests on the suggestions (TreeMultiMap, ConcurrentSkipListMap, TreeSet + HashMap) and update with the results. I will select the one with the best performance then, as I'd like to select all three of them but I cannot.

NOTE2

So, I did some performance testing with 1.5 million entries, and these are the results :

  • ConcurrentSkipListMap, it doesn't work as I expected, because it replaces the existing value with the new empty set I provided. I thought it was setting the value only if it the key doesn't exist, so I cannot use this one. (my mistake).

  • TreeSet + HashMap, works fine but doesn't give the best performance. It is like 1.5 times slower than TreeMap alone or TreeMultiMap.

  • TreeMultiMap gives the best performance, but it is almost the same as the TreeMap alone. I will check this one as the answer.

Again, thanks a lot for your contributions and help.

Community
  • 1
  • 1
richardtz
  • 4,993
  • 2
  • 27
  • 38
  • If performance is critical I wouldn't use a TreeSet of Integer, I would find a more light weight structure like TIntArrayList or something which wraps `int` values. I would also use a HashMap as its look up is O(1) instead of O(log N). If you also need to keep the keys sorted, I would use a second collection for that. – Peter Lawrey Nov 08 '13 at 11:50

3 Answers3

2

If performance is critical I wouldn't use a TreeSet of Integer, I would find a more light weight structure like TIntArrayList or something which wraps int values. I would also use a HashMap as its look up is O(1) instead of O(log N). If you also need to keep the keys sorted, I would use a second collection for that.

I agree that putIfAbsent on ConcurrentHashMap is overkill and get/put on a HashMap is likely to be the fastest option.

ConcurrentSkipListMap might be a good option to use putIfAbsent, but it I would make sure its not slower.

BTW Even worse than doing a get/put is creating a HashSet you don't need.

Peter Lawrey
  • 525,659
  • 79
  • 751
  • 1,130
  • PutIfAbsent is not overkill, its threadsafe. If you work with threads there is no way around that. – TwoThe Nov 08 '13 at 11:52
  • 2
    @TwoThe The OP specifically stated "in my situation my process is single threaded ". – Peter Lawrey Nov 08 '13 at 11:53
  • 2
    @TwoThe It is even overkill in a multithreaded case as it creates a HasHSet needlessly just in case it needs it even if it doesn't (which creates a HashMap and an Entry[]. So three objects you most likely don't need. – Peter Lawrey Nov 08 '13 at 11:54
  • I have difficulties with the word 'overkill' on things that potentially slow down execution by 0.000001%. It is ok to notice that this is slower than non-concurrent maps, but the actual gain/loss is completely dependent on the usage. – TwoThe Nov 08 '13 at 11:56
  • @Peter Lawrey I will change the TreeSet, no problem there. But I will need to traverse the map in natural order of the keys. Even with this requisite is it better to use a HashMap than a TreeMap? – richardtz Nov 08 '13 at 11:58
  • 1
    @richardtz Why not use both. a TreeSet for the keys in sorted order and HashMap for looking them up. I assume the traversal of the keys is not done too often at for millions of keys that won't be cheap. – Peter Lawrey Nov 08 '13 at 11:59
2

PutIfAbsent has the benefit of concurrency, that is: if many threads call this at the same time, they don't have to wait (it doesn't use synchronized internally). However this comes at a minor cost of execution speed, so if you work only single-threaded, this will slow things down.

If you need this sorted, try the ConcurrentSkipListMap.

TwoThe
  • 13,879
  • 6
  • 30
  • 54
2
  • Concurrent map does not do magic, it checks the existence and then inserts if not exists.
  • Guava have MultiMaps, for example TreeMultiMap can be what you need.
Amir Pashazadeh
  • 7,170
  • 3
  • 39
  • 69