2

How to store the result generated from an action like: count in an output directory, in apache Spark Scala?

    val countval= data.map((_,"")).reduceByKey((_+_)).count

The below command does not work as count is not stored as RDD:

    countval.saveAsTextFile("OUTPUT LOCATION")

Is there any way to store countval into local/hdfs location?

user1326784
  • 627
  • 3
  • 11
  • 31
  • See [How to write to a file in Scala?](http://stackoverflow.com/q/4604237/1560062) and [Write a file in hdfs with Java](http://stackoverflow.com/questions/16000840/write-a-file-in-hdfs-with-java) and [How to write to HDFS using Scala](http://stackoverflow.com/questions/32380272/how-to-write-to-hdfs-using-scala) – zero323 Dec 22 '15 at 10:10
  • Maybe he wants to use a Scala library to achieve this? – Alberto Bonsanto Dec 22 '15 at 13:17

2 Answers2

1

After you call count it is no longer RDD.

Count is just Long and it does not have saveAsTextFile method.

If you want to store your countval you have to do it like with any other long, string, int...

szefuf
  • 500
  • 3
  • 14
1

what @szefuf said is correct, after count you have a Long which you can save any way you want. If you want to save it as an RDD with .saveAsTextFile() you have to convert it to an RDD:

 sc.parallelize(Seq(countval)).saveAsTextFile("/file/location")

The parallelize method in SparkContext turns a collection of values into an RDD, so you need to turn the single value to a single-element sequence first. Then you can save it.

Roberto Congiu
  • 5,123
  • 1
  • 27
  • 37