6

I am trying to submit a spark job to AWS EMR cluster using AWS console. But it fails with:

Cannot load main class from JAR. The job runs successfully when I specify main class as --class in Arguments option in AWS EMR Console-> Add Step.

On the local machine, the job seems to work perfectly fine when no main class is specified as below:

 ./spark-submit /home/astro/spark-programs/SpotEMR/MyJob.jar

I have set main class to jar using run configuration. The main reason to avoid passing main class as --class is, I have to run this job in AWS Datapipeline using EMRAcivity. In AWS Datapipeline, currently there is no way to specify a main class to a job being submitted.

Any help will be appreciated.

Atish
  • 4,277
  • 2
  • 24
  • 32

1 Answers1

3

Actually, you can pass the job's main class with EMRActivity and AWS Datapipeline.

See https://docs.aws.amazon.com/datapipeline/latest/DeveloperGuide/dp-object-emractivity.html to launch a EMRActivity using step.

as well as https://docs.aws.amazon.com/emr/latest/ReleaseGuide/emr-spark-submit-step.html to submit a spark job using an EMR step with a main class.

The step would look as follows:

command-runner.jar,spark-submit,--class,org.apache.spark.examples.SparkPi
Frederic
  • 3,274
  • 1
  • 21
  • 37
  • If this answered your question, please accept the answer. – Frederic Jan 25 '18 at 09:41
  • I did try the way you mentioned here. Its still failing. Will check logs for the possible cause – Atish Jan 25 '18 at 09:58
  • No AMI required. Works with the current EMR release. – Frederic Jan 25 '18 at 11:52
  • When I try this it gives me EmrCluster is supported only for AmiVersion 2.4.8 and above for hadoop 1 and AmiVersion 3.1.1 and above for hadoop 2. You are using AmiVersion 2.3.0' When I tried to run it on 4.3 version and 5.11, it goes in canceled state with no errors – Atish Jan 25 '18 at 11:56
  • do you have any doc that explains the steps required to run emr job on amazon datapipeline? – Atish Jan 25 '18 at 11:57
  • https://github.com/awslabs/data-pipeline-samples/blob/master/samples/SparkPiMaximizeResourceAllocation/SparkPi-maximizeResource.json or use "Build using a template" – Frederic Jan 25 '18 at 12:04