2

I followed "A Standalone App in Java" part of the tutorial https://spark.apache.org/docs/0.8.1/quick-start.html

This part worked as expected

$ mvn package
$ mvn exec:java -Dexec.mainClass="SimpleApp"
...
Lines with a: 46, Lines with b: 23

How can I run the same class on the Cluster in parallel ? If I can pass this step I will use HDFS data as input. Is it possible to run this SimpleApp.java with parameters like this:

./run-example <class> <params>
likeaprogrammer
  • 405
  • 1
  • 5
  • 13

2 Answers2

3

I would suggest you to write simple Java or Scala class in your IDE. Create SparkConf and SparkContext objects in your "SimpleApp.java".

SparkConf conf = new SparkConf().setAppName(appName).setMaster("local[2]");
JavaSparkContext sc = new JavaSparkContext(conf);

Once you run maven clean package or maven package it will create jar file in your project's target folder. If it doesn't then create a JAR file using following command. You can find the SimpleApp.class file in "target/classes" folder. cd to this directory.

jar cfve file.jar SimpleApp.class

Put this JAR file into your project in target directory. This JAR file contains the dependency of your SimpleApp class while submitting your job to Spark. I guess you have project structure like below.

simpleapp
 - src/main/java
  - org.apache.spark.examples
    -SimpleApp.java
 - lib
  - dependent.jars (you can put all dependent jars inside lib directory)
 - target
  - simpleapp.jar (after compiling your source)

cd to your spark directory. I am using spark-1.4.0-bin-hadoop2.6. Your cmd looks like this.

spark-1.4.0-bin-hadoop2.6>

Start the master and worker using following commands.

spark-1.4.0-bin-hadoop2.6> ./sbin/start-all.sh

If this does not work then start master and slaves separately.

spark-1.4.0-bin-hadoop2.6> ./sbin/start-master.sh
spark-1.4.0-bin-hadoop2.6> ./sbin/start-slaves.sh

Submit your spark program using Spark Submit. If you have structure like I explained then pass this argument in class.

--class org.apache.spark.examples.SimpleApp

else

--class SimpleApp

Finally submit your spark program through spark submit.

spark-1.4.0-bin-hadoop2.6>./bin/spark-submit --class SimpleApp --master local[2] /PATH-TO-YOUR-PROJECT-DIRECTORY/target/file.jar

Here I have used local[2] as a master so my program will run on two threads but you can pass your master URL in --master as a --master spark://YOUR-HOSTNAME:7077

Port number 7077 is a default port number for Master URL.

Mohamed Taher Alrefaie
  • 15,698
  • 9
  • 48
  • 66
Brijesh Patel
  • 417
  • 5
  • 11
  • But how master know about slaves location/configuration as both are separate `start-master.sh` and `start-slaves.sh` shell script ? Where are we specifying number of slaves required ? – user3198603 Jun 09 '17 at 09:48
0

I don't run them with mvn, I just build a fat jar, scp it to the cluster, then run:

java -cp /path/to/jar.jar com.yourcompany.yourpackage.YourApp some arguments
samthebest
  • 30,803
  • 25
  • 102
  • 142
  • I am getting the error in the screenshot1 when I run from the fat jar. TestApp.java is shown in screenshot2. Hope you can help me with this. screenshot1 : http://oi60.tinypic.com/9qgttz.jpg screenshot2 : http://oi58.tinypic.com/mc3oyr.jpg – likeaprogrammer Apr 30 '14 at 07:37