0

I wonder how to idiomatically iterate a java.util.HashSet in Scala. Currently, I am using the java iterators in a while loop which does not seem to be great.

Additionally, I wonder if the mutable growable buffer is efficient or if there is a possibility to avoid the creation of unnecessary objects.

import java.util

import scala.collection.generic.Growable
import scala.collection.mutable

val javaSet = new util.HashSet[String]()
javaSet.add("first")
javaSet.add("second")

val result: collection.Seq[String] with Growable[String] = mutable.Buffer[String]()
val itr = javaSet.iterator

while (itr.hasNext) {
  result += itr.next
}

result

edit

Would a stream be better? Apache Spark: Effectively using mapPartitions in Java

Community
  • 1
  • 1
Georg Heiler
  • 16,916
  • 36
  • 162
  • 292

1 Answers1

3

Since you are using a Java HashSet apparently, do this first:

import scala.collection.JavaConverters._

This lets you turn Java collections into Scala collections, which are much easier to work with, using asScala.

So if you have an instance of HashSet called set, you can do this:

set.asScala.map(value => doSomething(value))

Or whatever you want to do like filter, foldLeft, etc.

FYI, the above example can be syntactically sugared to this:

set.asScala.map(doSomething)
Vidya
  • 29,932
  • 7
  • 42
  • 70
  • I see. But will this actually be efficient? As I need to call this method in the mapPartitions method of a spark job I would want to not create unnecessary objects. – Georg Heiler Mar 17 '17 at 07:07
  • 1
    That shouldn't be your concern. Your concern when running a Spark job should be minimizing network shuffle traffic and tuning garbage collection. Besides, you are spending time looking for solutions to a problem that you haven't proven exists. [Premature optimization is the root of all evil (or at least most of it) in programming.](https://en.wikiquote.org/wiki/Donald_Knuth) – Vidya Mar 17 '17 at 14:29