Solution #1 using cache to memory and disk
You can use cache to memory and disk - you'll avoid computing it twice, but you'll have to read the data from disk (instead of RAM)
using persist() with MEMORY_AND_DISK as parameter.
This will save the computed data to memory or disk
http://spark.apache.org/docs/latest/programming-guide.html#which-storage-level-to-choose
MEMORY_AND_DISK Store RDD as deserialized Java objects in the JVM. If the RDD does not fit in memory, store the partitions that don't fit on disk, and read them from there when they're needed.
Solution #2 perform the count using accumulator
similar question was asked/answer here:
http://thread.gmane.org/gmane.comp.lang.scala.spark.user/7920
With the suggestion to use accumulator, which will be apply before applying saveAsObjectFile()
val counts_accum = sc.longAccumulator("count Accumulator")
output.map{x =>
counts_accum.add(1)
x
}.saveAsObjectFile("file.out")
After the saveAsObjectFile will be completed, the accumulator value will hold the total count, and you'll be able to print it (you'll have to use ".value" in order to get the accumulator value)
println(counts_accum.value)
If accumulators are created with a name, they will be displayed in Spark’s UI. This can be useful for understanding the progress of running stages
More info can be found here:
http://spark.apache.org/docs/latest/programming-guide.html#accumulators