Spark : how to run spark file from spark shell

Question

I am using CDH 5.2. I am able to use spark-shell to run the commands.

How can I run the file(file.spark) which contain spark commands.
Is there any way to run/compile the scala programs in CDH 5.2 without sbt?

score 163 · Answer 1 · edited Sep 28 '16 at 22:31

163

In command line, you can use

spark-shell -i file.scala

to run code which is written in file.scala

edited Sep 28 '16 at 22:31

OneCricketeer

179,855
19
132
245

answered Mar 17 '15 at 02:17

Ziyao Li

1,667
2
10
5

9

Thanks, because this is not in `spark shell -h` – hbogert Aug 04 '15 at 19:18
8

I have tried the command but it does not run the code from the file instead starts the scala shell – Alex Raj Kaliamoorthy Jul 19 '16 at 16:41
7

@AlexRajKaliamoorthy I might be late. Just trying to help your comment/question. It does execute however you need to include System.exit(0) to the end of the script in order to exit the spark-shell – letsBeePolite Aug 11 '16 at 01:11
still not working only defned the object then start scala shell – Harit Singh Aug 19 '16 at 10:27
How to run .py (Python files) ?? – Abhishek Pansotra Jan 19 '17 at 09:15
@AbhishekPansotra `spark-submit [options] [app arguments]` – OneCricketeer Jan 20 '17 at 16:25
how to send argument to file.scala in this case? – HappyCoding Mar 14 '17 at 08:05
2

In the Scala file if you define an Object SparkTest{...} you need to call main SparkTest.main(args = Array()) and System.exit(0) after like mentioned above. – Invincible Oct 27 '17 at 17:23
@Ziyao Li, when i typed spark-shell --help it didnt showed me -i option, why so? – loneStar Nov 07 '17 at 23:31
where is the doc with that - maybe I ll look into the code - faster – rio Jan 31 '20 at 08:30

score 115 · Accepted Answer · edited Jan 27 '22 at 07:46

115

To load an external file from spark-shell simply do

:load PATH_TO_FILE

This will call everything in your file.

I don't have a solution for your SBT question though sorry :-)

edited Jan 27 '22 at 07:46

mtk

13,221
16
72
112

answered Dec 31 '14 at 16:59

Steve

21,163
21
69
92

2

Hi, this command working if I have a file in local machine, but is it possible to refer this location as hdfs path. i.e. :load hdfs://localhost:9000/file – ǨÅVËĔŊ RĀǞĴĄŅ Jun 30 '15 at 11:18
It is not working for me. I am using CDH 5.7 quick start VM – Alex Raj Kaliamoorthy Jul 19 '16 at 16:40

score 12 · Answer 3 · edited Sep 28 '16 at 22:31

You can use either sbt or maven to compile spark programs. Simply add the spark as dependency to maven

<repository>
      <id>Spark repository</id>
      <url>http://www.sparkjava.com/nexus/content/repositories/spark/</url>
</repository>

And then the dependency:

<dependency>
      <groupId>spark</groupId>
      <artifactId>spark</artifactId>
      <version>1.2.0</version>
</dependency>

In terms of running a file with spark commands: you can simply do this:

echo"
   import org.apache.spark.sql.*
   ssc = new SQLContext(sc)
   ssc.sql("select * from mytable").collect
" > spark.input

Now run the commands script:

cat spark.input | spark-shell

Downvoting on an apparently useful answer would at least merit an explanation for your concern. — WestCoastProjects, Dec 15 '15 at 17:03

score 9 · Answer 4 · edited Sep 18 '18 at 17:18

Just to give more perspective to the answers

Spark-shell is a scala repl

You can type :help to see the list of operation that are possible inside the scala shell

scala> :help
All commands can be abbreviated, e.g., :he instead of :help.
:edit <id>|<line>        edit history
:help [command]          print this summary or command-specific help
:history [num]           show the history (optional num is commands to show)
:h? <string>             search the history
:imports [name name ...] show import history, identifying sources of names
:implicits [-v]          show the implicits in scope
:javap <path|class>      disassemble a file or class name
:line <id>|<line>        place line(s) at the end of history
:load <path>             interpret lines in a file
:paste [-raw] [path]     enter paste mode or paste a file
:power                   enable power user mode
:quit                    exit the interpreter
:replay [options]        reset the repl and replay all previous commands
:require <path>          add a jar to the classpath
:reset [options]         reset the repl to its initial state, forgetting all session entries
:save <path>             save replayable session to a file
:sh <command line>       run a shell command (result is implicitly => List[String])
:settings <options>      update compiler options, if possible; see reset
:silent                  disable/enable automatic printing of results
:type [-v] <expr>        display the type of an expression without evaluating it
:kind [-v] <expr>        display the kind of expression's type
:warnings                show the suppressed warnings from the most recent line which had any

:load interpret lines in a file

score 8 · Answer 5 · answered Nov 22 '19 at 11:50

8

Tested on both spark-shell version 1.6.3 and spark2-shell version 2.3.0.2.6.5.179-4, you can directly pipe to the shell's stdin like

spark-shell <<< "1+1"

or in your use case,

spark-shell < file.spark

answered Nov 22 '19 at 11:50

Phu Ngo

866
11
21

It works, but output to stdout is basically a replay of everything you would see if you enter the spark-shell and enter all lines from the file. – Merlin Dec 31 '19 at 04:08

score 0 · Answer 6 · edited Jan 21 '20 at 15:29

0

You can run as you run your shell script. This example to run from command line environment example

./bin/spark-shell :- this is the path of your spark-shell under bin /home/fold1/spark_program.py :- This is the path where your python program is there.

So:

./bin.spark-shell /home/fold1/spark_prohram.py

edited Jan 21 '20 at 15:29

octobus

1,246
1
14
20

answered Jan 21 '20 at 13:48

amarnath pimple

135
1
6

Spark : how to run spark file from spark shell

6 Answers6

Linked