1

I have a spark app that reads large data, loads it in memory and sets everything in between ready for user to query the dataframe in memory multiple times. Once a query is done, the user is prompted on the console to either continue with new set of input or quit the application.

I can do this very well on the IDE. However, can I run this interactive spark app from spark-shell?

I've used spark job server before to achieve multiple interactive querying on a memory loaded dataframe but not from a shell. Any pointers?

Thanks!

UPDATE 1: Here is how the project jar looks and its packaged with all the other dependencies.

jar tf target/myhome-0.0.1-SNAPSHOT.jar 
META-INF/MANIFEST.MF
META-INF/
my_home/
my_home/myhome/
my_home/myhome/App$$anonfun$foo$1.class
my_home/myhome/App$.class
my_home/myhome/App.class
my_home/myhome/Constants$.class
my_home/myhome/Constants.class
my_home/myhome/RecommendMatch$$anonfun$1.class
my_home/myhome/RecommendMatch$$anonfun$2.class
my_home/myhome/RecommendMatch$$anonfun$3.class
my_home/myhome/RecommendMatch$.class
my_home/myhome/RecommendMatch.class

and ran spark-shell with the following options

spark-shell -i my_home/myhome/RecommendMatch.class --master local --jars /Users/anon/Documents/Works/sparkworkspace/myhome/target/myhome-0.0.1-SNAPSHOT.jar 

but shell throws the following message on start up. The jars are loaded as per the environment shown at localhost:4040

Setting default log level to "WARN".
To adjust logging level use sc.setLogLevel(newLevel). For SparkR, use setLogLevel(newLevel).
17/05/16 10:10:01 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
17/05/16 10:10:06 WARN ObjectStore: Failed to get database global_temp, returning NoSuchObjectException
Spark context Web UI available at http://192.168.0.101:4040
Spark context available as 'sc' (master = local, app id = local-1494909601904).
Spark session available as 'spark'.
That file does not exist

Welcome to
 ...

UPDATE 2 (using spark-submit) Tried with full path to jar. Next, tried by copying project jar to bin location.

pwd
/usr/local/Cellar/apache-spark/2.1.0/bin

spark-submit --master local —-class my_home.myhome.RecommendMatch.class --jars myhome-0.0.1-SNAPSHOT.jar
Error: Cannot load main class from JAR file:/usr/local/Cellar/apache-spark/2.1.0/bin/—-class
user1384205
  • 1,231
  • 3
  • 20
  • 39

2 Answers2

0

Try the -i <path_to_file> option to run the scala code in your file or the scala shell :load <path_to_file> function.

Relevant Q&A: Spark : how to run spark file from spark shell

Community
  • 1
  • 1
Garren S
  • 5,552
  • 3
  • 30
  • 45
  • I saw this and attempted it. Please see my update above. – user1384205 May 16 '17 at 04:43
  • That style is for raw scala files using the same syntax that you'd use in the shell. You're basically just passing the commands via a file instead of typing them in the shell directly – Garren S May 16 '17 at 04:55
  • ah thanks. So can I invoke spark-shell like this spark-shell --master local —-class my_home.myhome.RecommendMatch.class --jars /Users/anon/Documents/Works/sparkworkspace/myhome/target/myhome-0.0.1-SNAPSHOT.jar ? There is not file not found error but I'm still struggling with how to call the main function on this class – user1384205 May 16 '17 at 06:21
0

The following command works to run an interactive spark application.

spark-submit /usr/local/Cellar/apache-spark/2.1.0/bin/myhome-0.0.1-SNAPSHOT.jar

Note that is a uber jar built with the main class as entry point and all dependent libraries. Check out http://maven.apache.org/plugins/maven-shade-plugin/

user1384205
  • 1,231
  • 3
  • 20
  • 39