1

I am running a sequence model in Spark using Scala API. This is the line of code to see the outcome:

model.freqSequences.collect().foreach { freqSequence => println(freqSequence.sequence.map(_.mkString("[", ", ", "]")).mkString("[", ", ", "]") + ", " + freqSequence.freq)}

The problem is the outcome is getting big and don't want to use collect() anymore but saving it in a file either in HDFS or local. I tried this:

scala> val outcome = model.freqSequences.foreach { freqSequence => println(freqSequence.sequence.map(_.mkString("[", ", ", "]")).mkString("[", ", ", "]") + ", " + freqSequence.freq)}

scala> outcome.saveAsTextFile("tmp/outcome1/")

error: saveAsTextFile is not a member of Unit

The outcome is a Unit and I am not able to use saveAsTextFile. Any other way to save this outcome? Txs.

Tzach Zohar
  • 37,442
  • 3
  • 79
  • 85
  • Possible duplicate of [Save ML model for future usage](http://stackoverflow.com/questions/33027767/save-ml-model-for-future-usage) – Tzach Zohar May 25 '16 at 17:39

1 Answers1

0

foreach returns a Unit.

You want to first map to a String so you can save as a file. Something like:

val outcome = model.freqSequences.map { freqSequence => freqSequence.sequence.map(_.mkString("[", ", ", "]")).mkString("[", ", ", "]") + ", " + freqSequence.freq}
// print
outcome.foreach(println)
// save
outcome.saveAsTextFile("tmp/outcome1/")
Jean Logeart
  • 52,687
  • 11
  • 83
  • 118
  • outcome.foreach(printIn) might not provide the correct result using multiple nodes. Since this is applied to a big data set outcome.take(number).foreach(printIn) provides better results without the need to use collect(). – Fredy Gomez May 27 '16 at 18:08