7

The only two way I know to run Scala based spark code is to either compile a Scala program into a jar file and run it with spark-submit, or run a Scala script by using :load inside the spark-shell. My question is, it is possible to run a Scala file directly on the command line, without first going inside spark-shell and then issuing :load?

MetallicPriest
  • 29,191
  • 52
  • 200
  • 356

2 Answers2

7

You can simply use the stdin redirection with spark-shell:

spark-shell < YourSparkCode.scala

This command starts a spark-shell, interprets your YourSparkCode.scala line by line and quits at the end.

Another option is to use -I <file> option of spark-shell command:

spark-shell -I YourSparkCode.scala

The only difference is that the latter command leaves you inside the shell and you must issue :quit command to close the session.

[UDP] Passing parameters

Since spark-shell does not execute your source as an application but just interprets your source file line by line, you cannot pass any parameters directly as application arguments.

Fortunately, there may be a lot of options to approach the same (e.g, externalizing the parameters in another file and read it in the very beginning in your script).

But I personally find the Spark configuration the most clean and convenient way.

Your pass your parameters via --conf option:

spark-shell --conf spark.myscript.arg1=val1 --conf spark.yourspace.arg2=val2 < YourSparkCode.scala

(please note that spark. prefix in your property name is mandatory, otherwise Spark will discard your property as invalid)

And read these arguments in your Spark code as below:

val arg1: String = spark.conf.get("spark.myscript.arg1")
val arg2: String = spark.conf.get("spark.myscript.arg2")
egordoe
  • 918
  • 1
  • 5
  • 12
  • 1
    But is there any way to pass command line arguments like that? – MetallicPriest Feb 22 '20 at 09:58
  • 1
    @MetallicPriest good question. I don't know any really elegant way to parameterize your code. I see at least 2 options here. First one - to have placeholders in your source file and replace them before you sent the content to the spark-shell. The second one is to pass your parameters as a spark config properties. I will update my answer. – egordoe Feb 22 '20 at 15:08
0

It is possible via spark-submit.

https://spark.apache.org/docs/latest/submitting-applications.html

You can even put it to bash script either create sbt-task https://www.scala-sbt.org/1.x/docs/Tasks.html to run your code.

Iva Kam
  • 932
  • 4
  • 13
  • 2
    Where does it say you can run a Scala file through spark-submit? As far as I know you can only submit a compiled jar file with spark-submit. – MetallicPriest Feb 21 '20 at 16:03