I submitted a spark job using Spark Hidden Rest. The example I used for this is word count. Running this job using spark-submit works fine and output directory with proper hdfs file with wordcount gets created. However when I run the same program using Spark hidden REST API, output hdfs file gets created with only tempory file inside and no output.
The below is the request I am sending for Spark REST which submits the job but after the job completes
curl -X POST http://clusterIP:6066/v1/submissions/create --header "Content-Type:application/json;charset=UTF-8" --data '{
"action" : "CreateSubmissionRequest",
"appArgs" : [ "hdfs://clusterIP:8020/tmp/inputfile"],
"appResource" : "hdfs://clusterIP:8020/tmp/Sparkwc.jar",
"clientSparkVersion" : "1.6.0",
"environmentVariables" : {
"SPARK_ENV_LOADED" : "1"
},
"mainClass" : "org.learningspark.simple.WordCount",
"sparkProperties" : {
"spark.driver.supervise" : "true",
"driverCores": "8",
"superviseDriver": "true",
"executorMemory": "2g",
"totalExecutorCores": "40",
"jars": "hdfs://clusterIP/tmp/Sparkwc.jar",
"spark.app.name" : "WordCountTest",
"spark.eventLog.enabled": "true",
"spark.submit.deployMode" : "cluster",
"spark.master" : "spark://clusterIP:6066"
}
}'
But running the same program using the below spark-submit command works fine and proper hdfs output file with wordcount output is created (instead of hdfs output file with just the temporary file in case of spark rest):
spark-submit --class WordCount --total-executor-cores 4 --master spark://clusterIP:7077 Sparkwc.jar hdfs://clusterIP:8020/tmp/inputfile
thanks.