3

I recently wanted to know how to save a HashMap to file and later read it back. The user Carrie posted a similar question, but the accepted top answer can't be correct. Since I don't have enough reputation to comment that answer, I'm explaining one way to do this, in case someone has the same question.


Question

I have a lookup map for a custom hash function where Integers are mapped to Sets of (hash: Int, value: String) tuples.

val lookupMap: Map[Int, Set[(Int, String)]] = ... // filling this map is a different story

I want to save this Map to a file and later read it bas as a map. This answer suggests to use sc.textFile("...").collectAsMap, but that doesn't work, because textFile returns RDD[String].

Community
  • 1
  • 1
robinki
  • 362
  • 3
  • 18

1 Answers1

4

Saving to file

Take the map and convert it to Seq. Then use sc.parallelize to form an RDD, which you save as an object file with sc.saveAsObjectFile.

val savePath = "lookup_map"    
val lookupMap: Map[Int, mutable.Set[(Int, String)]] = ... // fill your map
sc.parallelize(lookupMap.toSeq).saveAsObjectFile(savePath)

Reading from file

To read your map, you have to know it's data type. In this case Map[Int, mutable.Set[(Int, String)]] was converted to a Seq which simply makes it (Int, Set[(Int, String)]). Then you can use sc.objectFile[Type](path) to read the file and collect it as a map with collectAsMap.

type LookupMapSeq = (Int, Set[(Int, String)])
val path = "lookup_map/part-[0-9]*"
val lookupMap = sc.objectFile[LookupMapSeq](path).collectAsMap()

As expected, the resulting data type is Map[Int, Set[(Int, String)]].

robinki
  • 362
  • 3
  • 18