1

After researching for 1 week, had to put this request:

  • Environment: Azure HDInsight
  • Oozie version: "Oozie client build version: 4.2.0.2.6.5.3004-13"
  • Spark: Spark2
  • My program: simple Scala program reads a file, i.csv, and writes the same into o.csv
  • Tested with Spark-Submit: Yes

job.properties

nameNode=wasb://mycontainer@something.blob.core.windows.net
jobTracker=hn0-something.internal.cloudapp.net:8050
master=yarn-cluster
queueName=default
deployed_loc=zs_app
oozie.use.system.libpath=true
oozie.wf.application.path=${nameNode}/${deployed_loc}

workflow.xml:

<workflow-app xmlns='uri:oozie:workflow:0.3' name='zs-wf'>
    <start to="Loader" />
    <action name="Loader">
        <spark xmlns="uri:oozie:spark-action:0.1">
            <job-tracker>${jobTracker}</job-tracker>
            <name-node>${nameNode}</name-node>
            <prepare>
               <delete path="${nameNode}/${deployed_loc}/output-data"/>
            </prepare>
            <configuration>
                <property>
                    <name>mapred.compress.map.output</name>
                    <value>true</value>
                </property>
            </configuration>
            <master>${master}</master>
            <mode>cluster</mode>
            <name>Spark-Loader</name>
            <class>zs.test</class>
            <jar>${nameNode}/${deployed_loc}/zs_app.jar</jar>                        
            <arg>--testId=1</arg>            
        </spark>
            <ok to="end" />
            <error to="fail" />
            </action>
            <kill name="fail">
            <message>Workflow failed, error
            message[${wf:errorMessage(wf:lastErrorNode())}] </message>
            </kill>
            <end name='end' />
</workflow-app>

I get below exception:

Exception in thread "main" java.lang.NoClassDefFoundError: org/apache/spark/sql/SparkSession
        at java.lang.Class.getDeclaredMethods0(Native Method)
        at java.lang.Class.privateGetDeclaredMethods(Class.java:2701)
        at java.lang.Class.privateGetMethodRecursive(Class.java:3048)
        at java.lang.Class.getMethod0(Class.java:3018)
        at java.lang.Class.getMethod(Class.java:1784)
        at org.apache.spark.deploy.yarn.ApplicationMaster.startUserApplication(ApplicationMaster.scala:556)
        at org.apache.spark.deploy.yarn.ApplicationMaster.runDriver(ApplicationMaster.scala:338)
        at org.apache.spark.deploy.yarn.ApplicationMaster.run(ApplicationMaster.scala:204)
        at org.apache.spark.deploy.yarn.ApplicationMaster$$anonfun$main$1.apply$mcV$sp(ApplicationMaster.scala:674)
        at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:68)
        at org.apache.spark.deploy.SparkHadoopUtil$$anon$1.run(SparkHadoopUtil.scala:67)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
        at org.apache.spark.deploy.SparkHadoopUtil.runAsSparkUser(SparkHadoopUtil.scala:67)
        at org.apache.spark.deploy.yarn.ApplicationMaster$.main(ApplicationMaster.scala:672)
        at org.apache.spark.deploy.yarn.ApplicationMaster.main(ApplicationMaster.scala)
Caused by: java.lang.ClassNotFoundException: org.apache.spark.sql.SparkSession
        at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:424)
        at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:349)
        at java.lang.ClassLoader.loadClass(ClassLoader.java:357)
        ... 17 more

And I conclude these:

  • Some how it is pointing to < Spark 2 as spark session was introduced later version of Spark
  • Also, oozie could submit the job, as this error I extracted using "yarn logs -applicationId appid", where I got the appid from oozie logs.

Now if I add this line in the job.properties

oozie.action.sharelib.for.spark=spark2

I get below exception:

JOB[0000115-181216154825160-oozie-oozi-W] ACTION[0000115-181216154825160-oozie-oozi-W@Loader] Launcher exception: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SparkMain not found
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SparkMain not found
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2308)
    at org.apache.oozie.action.hadoop.LauncherMapper.map(LauncherMapper.java:229)
    at org.apache.hadoop.mapred.MapRunner.run(MapRunner.java:54)
    at org.apache.hadoop.mapred.MapTask.runOldMapper(MapTask.java:453)
    at org.apache.hadoop.mapred.MapTask.run(MapTask.java:343)
    at org.apache.hadoop.mapred.YarnChild$2.run(YarnChild.java:170)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1869)
    at org.apache.hadoop.mapred.YarnChild.main(YarnChild.java:164)
Caused by: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SparkMain not found
    at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2214)
    at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2306)
    ... 9 more

And I conclude these:

  • Oozie could not submit the job, as I find the error on the oozie log itself.

I don't understand why this has to be this complicated, if Microsoft Azure is packaging HDInsight with spark2, oozie...this thing should run smoothly or with minor changes, a clean documentation should be provided somewhere.

Eyedia Tech
  • 135
  • 1
  • 11

2 Answers2

0

Try setting your oozie share lib path in job.properties. For example mine is:

oozie.libpath=/user/oozie/share/lib/lib_20180312160954

Not sure where it is on azure environment though.

grantler
  • 216
  • 1
  • 7
0

Assuming that you used HDInsight 3.6 already, try oozie with Spark2 in the HDInsight 4.0 environment. Earlier versions seem to have trouble using Spark2 directly when using oozie.

HDInsight 4.0 uses HDP 3.0 . This might help. Spark2 with Oozie in HDP3.0

nj_bubbles
  • 78
  • 12