I would suggest you to write simple Java or Scala class in your IDE. Create SparkConf and SparkContext objects in your "SimpleApp.java".
SparkConf conf = new SparkConf().setAppName(appName).setMaster("local[2]");
JavaSparkContext sc = new JavaSparkContext(conf);
Once you run maven clean package
or maven package
it will create jar file in your project's target folder. If it doesn't then create a JAR file using following command. You can find the SimpleApp.class file in "target/classes" folder. cd to this directory.
jar cfve file.jar SimpleApp.class
Put this JAR file into your project in target directory. This JAR file contains the dependency of your SimpleApp class while submitting your job to Spark. I guess you have project structure like below.
simpleapp
- src/main/java
- org.apache.spark.examples
-SimpleApp.java
- lib
- dependent.jars (you can put all dependent jars inside lib directory)
- target
- simpleapp.jar (after compiling your source)
cd to your spark directory. I am using spark-1.4.0-bin-hadoop2.6. Your cmd looks like this.
spark-1.4.0-bin-hadoop2.6>
Start the master and worker using following commands.
spark-1.4.0-bin-hadoop2.6> ./sbin/start-all.sh
If this does not work then start master and slaves separately.
spark-1.4.0-bin-hadoop2.6> ./sbin/start-master.sh
spark-1.4.0-bin-hadoop2.6> ./sbin/start-slaves.sh
Submit your spark program using Spark Submit. If you have structure like I explained then pass this argument in class.
--class org.apache.spark.examples.SimpleApp
else
--class SimpleApp
Finally submit your spark program through spark submit.
spark-1.4.0-bin-hadoop2.6>./bin/spark-submit --class SimpleApp --master local[2] /PATH-TO-YOUR-PROJECT-DIRECTORY/target/file.jar
Here I have used local[2] as a master so my program will run on two threads but you can pass your master URL in --master as a --master spark://YOUR-HOSTNAME:7077
Port number 7077 is a default port number for Master URL.