Using scala's ParHashMap in Java's project instead of ConcurrentHashMap

Question

I've got a fairly complicated project, which heavily uses Java's multithreading. In an answer to one of my previous questions I have described an ugly hack, which is supposed to overcome inherent inability to iterate over Java's ConcurrentHashMap in parallel. Although it works, I don't like ugly hacks, and I've had a lot of trouble trying to introduce proposed proof of concept in the real system. Trying to find an alternative solution I have encountered Scala's ParHashMap, which claims to implement a foreach method, which seems to operate in parallel. Before I start learning a new language to implement a single feature I'd like to ask the following:

1) Is foreach method of Scala's ParHashMap scalable?

2) Is it simple and straightforward to call Java's code from Scala and vice versa? I'll just remind that the code is concurrent and uses generics.

3) Is there going to be a performance penalty for switching a part of codebase to Scala?

For reference, this is my previous question about parallel iteration of ConcurrentHashMap:

Scalable way to access every element of ConcurrentHashMap<Element, Boolean> exactly once

EDIT

I have implemented the proof of concept, in probably very non-idiomatic Scala, but it works just fine. AFAIK it is IMPOSSIBLE to implement a corresponding solution in Java given the current state of its standard library and any available third-party libraries.

import scala.collection.parallel.mutable.ParHashMap

class Node(value: Int, id: Int){
    var v = value
    var i = id
    override def toString(): String = v toString
}

object testParHashMap{
    def visit(entry: Tuple2[Int, Node]){
        entry._2.v += 1
    }
    def main(args: Array[String]){
        val hm = new ParHashMap[Int, Node]()
        for (i <- 1 to 10){
            var node = new Node(0, i)
            hm.put(node.i, node)
        }

        println("========== BEFORE ==========")
        hm.foreach{println}

        hm.foreach{visit}

        println("========== AFTER ==========")
        hm.foreach{println}

    }
}

score 1 · Accepted Answer · edited May 23 '17 at 12:23

I come to this with some caveats:

Though I can do some things, I consider myself relatively new to Scala.
I have only read about but never used the par stuff described here.
I have never tried to accomplish what you are trying to accomplish.

If you still care what I have to say, read on.

First, here is an academic paper describing how the parallel collections work.

On to your questions.

1) When it comes to multi-threading, Scala makes life so much easier than Java. The abstractions are just awesome. The ParHashMap you get from a par call will distribute the work to multiple threads. I can't say how that will scale for you without a better understanding of your machine, configuration, and use case, but done right (particularly with regard to side effects) it will be at least as good as a Java implementation. However, you might also want to look at Akka to have more control over everything. It sounds like that might be more suitable to your use case than simply ParHashMap.

2) It is generally simple to convert between Java and Scala collections using JavaConverters and the asJava and asScala methods. I would suggest though making sure that the public API for your method calls "looks Java" since Java is the least common denominator. Besides, in this scenario, Scala is an implementation detail, and you never want to leak those anyway. So keep the abstraction at a Java level.

3) I would guess there will actually be a performance gain with Scala--at runtime. However, you will find much slower compile time (which can be worked around. ish). This Stack Overflow post by the author of Scala is old but still relevant.

Hope that helps. That's quite a problem you got there.

Thanks for an excellent answer. I wasn't aware of JavaConverters, but they sound quite amazing. Many thanks for pointing me at the research paper, I'll read that one carefully. At this stage it seems I might be able to limit my scala code to just a few-liner, expose Java-friendly API and solve the problem by midnight :). The worst part will be learning a completely new syntax, but hopefully Scala would be useful in the future! Many thanks! — Adam Kurkiewicz, Nov 23 '13 at 18:52
The benchmarks presented in the paper seem to be relevant to my usecase, since the method I invoke when iterating the elements is very lightweight. It's nice of the authors to have benchmarked their implementations against lightweight operations, since it provides a far tougher challenge for concurrency. — Adam Kurkiewicz, Nov 23 '13 at 19:29

score 0 · Answer 2 · answered Nov 24 '13 at 13:30

Since Scala compiles to the same bytecode as Java, doing the same in both languages is very well possible, no matter the task. There are however some things which are easier to solve in Scala, but if this is worth learning a new language is a different question. Especially since Java 8 will include exactly what you ask for: simple parallel execution of functions on lists.

But even now you can do this in Java, you just need to write what Scala already has on your own.

final ExecutorService executor = Executors.newFixedThreadPool(Runtime.getRuntime().availableProcessors());
//...
final Entry<String, String>[] elements = (Entry<String, String>[]) myMap.entrySet().toArray();
final AtomicInteger index = new AtomicInteger(elements.length);

for (int i = Runtime.getRuntime().availableProcessors(); i > 0; --i) {
  executor.submit(new Runnable() {

    public void run() {
      int myIndex;
      while ((myIndex = index.decrementAndGet()) >= 0) {
        process(elements[myIndex]);
      }
    }
  });
}

The trick is to pull those elements into a temporary array, so threads can take out elements in a thread-safe way. Obviously doing some caching here instead of re-creating the Runnables and the array each time is encouraged, because the Runnable creation might already take longer than the actual task.

It is as well possible to instead copy the elements into a (reusable) LinkedBlockingQueue, then have the threads poll/take on it instead. This however adds more overhead and is only reasonable for tasks that require at least some calculation time.

I don't know how Scala actually works, but given the fact that it needs to run on the same JVM, it will do something similar in the background, it just happens to be easily accessible in the standard library.

Using scala's ParHashMap in Java's project instead of ConcurrentHashMap

2 Answers2