0

I am currently working on Apache spark. I want to see the time taken by the system to perform wordcount on a text file and store it in a file. I need to automate the commands with a bash script. I tried to run the following script :-

start-all.sh
    (time spark-shell 
     val inputfile = sc.textFile("/home/pi/Desktop/dataset/books_50.txt")
     val counts = inputfile.flatMap(line => line.split(" ")).map(word => (word,1)).reduceByKey(_+_);
     counts.toDebugString
     counts.cache()
     counts.saveAsTextFile("output")
     exit()
     ) 2> /home/pi/Desktop/spark_output/test.txt
stop-all.sh

For which it showed the following error:-

./wordcount_spark.sh: line 4: syntax error near unexpected token `('
./wordcount_spark.sh: line 4: ` val inputfile = sc.textFile("/home/pi/Desktop/dataset/books_50.txt")'

I tried to add EOF to the code and I got the following error:-

./wordcount_spark.sh: line 12: warning: here-document at line 3 delimited by end-of-file (wanted `EOF')
./wordcount_spark.sh: line 13: syntax error: unexpected end of file

I didn't understand how to pass scala commands through a bash script

itsamineral
  • 1,369
  • 3
  • 14
  • 19
  • run it with `spark-sehll -i` option - here is an example - http://stackoverflow.com/questions/29928999/passing-command-line-arguments-to-spark-shell – Ronak Patel Jun 30 '16 at 15:29

1 Answers1

0

Spark-shell is an interactive tool, meant to be used interactively by a user typing one command after the other, so it's ill-fitting for your needs.

You should take a look at the Self-Contained Applications section in Spark's Quick Start guide - which guides you how to write and build a simple Scala application, and execute it using spark-submit. That should better fit your requirement.

Tzach Zohar
  • 37,442
  • 3
  • 79
  • 85