1

I have this:

val tokenFreq = reverseKey.countByKey
// tokenFreq: scala.collection.Map[String,Long] = Map(ABIGAIL -> 3,...

and I want to save the tokenFreq's result into a text file.

I tried to use saveAsTextFile, but it says:

error: value saveAsTextFile is not a member of scala.collection.Map[String,Long]

Derlin
  • 9,572
  • 2
  • 32
  • 53
Carrie
  • 15
  • 1
  • 1
  • 4

3 Answers3

3

You can just convert the Map to an RDD[(String, Long)] then use the RDD api to save it.

val conf = new SparkConf().setAppName("TokenCounter").setMaster("local[4]")
val sc = new SparkContext(conf)

val tokenFreq = reverseKey.countByKey
sc.parallelize(tokenFreq.toSeq).saveAsTextFile("token_freq")

Of course, this will convert your data structure, however you can read it this RDD then collect it as a map to regain quick lookup.

val tokenFreqMap = sc.textFile("token_freq").collectAsMap  
Brian
  • 7,098
  • 15
  • 56
  • 73
1

As countByKey returns plain scala Map, you have to use scala's regular means to store it to the file.

Here is one way to do that:

import java.io.PrintWriter

new PrintWriter("filename") {
  tokenFreq.foreach {
    case (k, v) =>
      write(k + ":" + v)
      write("\n")
  }
  close()
}

Note, that this code will be executed on driver, after the result of countByKey is gathered from all workers.

Community
  • 1
  • 1
Aivean
  • 10,692
  • 25
  • 39
0

you can use saveAs* apis where your collection is distributed in your spark cluster. By using countByKey on RDD/DataFrame/DataSet, data will be collected among your data in cluster to your Spark driver. So you can't use saveAs* api on collected collections.

Milad Khajavi
  • 2,769
  • 9
  • 41
  • 66