submitting spark word count job using hidden REST is creating output hdfs directory with only temporary file

Question

I submitted a spark job using Spark Hidden Rest. The example I used for this is word count. Running this job using spark-submit works fine and output directory with proper hdfs file with wordcount gets created. However when I run the same program using Spark hidden REST API, output hdfs file gets created with only tempory file inside and no output.

The below is the request I am sending for Spark REST which submits the job but after the job completes

curl -X POST http://clusterIP:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{
  "action" : "CreateSubmissionRequest",
  "appArgs" : [ "hdfs://clusterIP:8020/tmp/inputfile"],
  "appResource" : "hdfs://clusterIP:8020/tmp/Sparkwc.jar",
  "clientSparkVersion" : "1.6.0",
  "environmentVariables" : {
    "SPARK_ENV_LOADED" : "1"
  },
  "mainClass" : "org.learningspark.simple.WordCount",
  "sparkProperties" : {
    "spark.driver.supervise" : "true",
    "driverCores": "8",
"superviseDriver": "true",
"executorMemory": "2g",
"totalExecutorCores": "40",
"jars": "hdfs://clusterIP/tmp/Sparkwc.jar",
    "spark.app.name" : "WordCountTest",
    "spark.eventLog.enabled": "true",
    "spark.submit.deployMode" : "cluster",
    "spark.master" : "spark://clusterIP:6066"
  }
}'

But running the same program using the below spark-submit command works fine and proper hdfs output file with wordcount output is created (instead of hdfs output file with just the temporary file in case of spark rest):

spark-submit --class WordCount --total-executor-cores 4 --master spark://clusterIP:7077 Sparkwc.jar hdfs://clusterIP:8020/tmp/inputfile

thanks.

Possible duplicate of [Spark REST API difficulties in understanding, goal sending RESTful messages from webpage](http://stackoverflow.com/questions/42560206/spark-rest-api-difficulties-in-understanding-goal-sending-restful-messages-from) — Paul Velthuis, Mar 08 '17 at 15:28
actually that issue is for not being able to trigger spark submit commands. In this one Spark REST runs and gives success but hdfs output file generated in hdfs has only temporary file whereas hdfs output file of same program using spark-submit has proper output with words and counts — Saurabh Rana, Mar 09 '17 at 06:19
I added timestamp to the outputput hdfs file name and observed the submitting spark word count job using spark REST is creating multiple output files instead of 1 output file as in the case of word count using spark-submit? — Saurabh Rana, Mar 09 '17 at 07:10
@PaulVelthuis this is not a duplicate. This is Apache Spark specific. — eliasah, Mar 09 '17 at 07:16
@SauravR do you mean you are getting file like part-0000, part-0001, etc. ? — eliasah, Mar 09 '17 at 07:17
yes..the same sample program does not give file part-0000 and part-0001 with word count output in case of Spark REST (only 0 KB empty file), but it gives these two files in case of submitting this program using spark-submit. — Saurabh Rana, Mar 09 '17 at 11:02

submitting spark word count job using hidden REST is creating output hdfs directory with only temporary file

0 Answers0