0

I have this error after trying to call a spark action in oozie on kerberized cluster using CDP.

error from yarn log:

java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SparkMain not found
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SparkMain not found
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2700)
        at org.apache.oozie.action.hadoop.LauncherAM.runActionMain(LauncherAM.java:411)
        at org.apache.oozie.action.hadoop.LauncherAM.access$400(LauncherAM.java:55)
        at org.apache.oozie.action.hadoop.LauncherAM$2.run(LauncherAM.java:232)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
        at org.apache.oozie.action.hadoop.LauncherAM.run(LauncherAM.java:226)
        at org.apache.oozie.action.hadoop.LauncherAM$1.run(LauncherAM.java:156)
        at java.security.AccessController.doPrivileged(Native Method)
        at javax.security.auth.Subject.doAs(Subject.java:422)
        at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
        at org.apache.oozie.action.hadoop.LauncherAM.main(LauncherAM.java:144)
Caused by: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SparkMain not found
        at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2604)
        at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2698)
        ... 12 more

I have tried many differents configuration on my workflow.xml

   <action name="start_traitement_load_01" cred="hcatauth">
        <spark xmlns="uri:oozie:spark-action:1.0">
            <resource-manager>yarnrm</resource-manager>
            <name-node>${nameNode}</name-node>
            <master>yarn-cluster</master>
            <name>${instanceNameLoad}</name>
            <class>${class}</class>
            <jar>my/spark/jar</jar>
            <spark-opts>
            --conf spark.yarn.principal=${principal}
            --conf spark.yarn.keytab=${userHomeDirectory}${keytabFile}
            --conf spark.yarn.security.tokens.hadoopfs.enabled=true
            --conf spark.yarn.security.tokens.hive.enabled=true
            --files ${logFile}#log4j2.yml,${userHomeDirectory}${keytabFile}#${keytabFile}</spark-opts>
        </spark>
        <ok to="start_traitement_01"/>
        <error to="fail"/>
    </action>

according to this SO question, added this to credential :

<credentials>
    <credential name='hcatauth' type='hcat'>
        <property>
            <name>hcat.metastore.uri</name>
            <value>thrift://server1:9083,thrift://server2:9083</value>
        </property>
        <property>
            <name>hcat.metastore.principal</name>
            <value>hive/_HOST@COMMUN01.SVC</value>
        </property>
    </credential>
</credentials>

Also adding oozie.use.system.libpath=true according to this SO question

in workflow:

    <property>
        <name>oozie.use.system.libpath</name>
        <value>true</value>
    </property>

I also tried to add my spark.yarn keytab and principal to workflow.xml:

        <property>
        <name>spark.yarn.keytab</name>
        <value>path/to/keytab(tried both HDFS and FS)</value>
    </property>
    <property>
        <name>spark.yarn.principal</name>
        <value>principal@domain</value>
    </property>

I dont think this is related to my problem, but i can't see my application in spark UI even if I configure it :

   <spark-opts>
        --keytab /path/to/keytab.keytab
        --principal principal@domain
        --conf spark.yarn.historyServer.address=http://server:18088
        --conf spark.eventLog.dir=hdfs://HA-name/user/spark/applicationHistory
        --conf spark.eventLog.enabled=true
    </spark-opts>

spark : 2.4 oozie: 5.1

maxime G
  • 1,660
  • 1
  • 10
  • 27

0 Answers0