I have this error after trying to call a spark action in oozie on kerberized cluster using CDP.
error from yarn log:
java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SparkMain not found
java.lang.RuntimeException: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SparkMain not found
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2700)
at org.apache.oozie.action.hadoop.LauncherAM.runActionMain(LauncherAM.java:411)
at org.apache.oozie.action.hadoop.LauncherAM.access$400(LauncherAM.java:55)
at org.apache.oozie.action.hadoop.LauncherAM$2.run(LauncherAM.java:232)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
at org.apache.oozie.action.hadoop.LauncherAM.run(LauncherAM.java:226)
at org.apache.oozie.action.hadoop.LauncherAM$1.run(LauncherAM.java:156)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:422)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1898)
at org.apache.oozie.action.hadoop.LauncherAM.main(LauncherAM.java:144)
Caused by: java.lang.ClassNotFoundException: Class org.apache.oozie.action.hadoop.SparkMain not found
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:2604)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:2698)
... 12 more
I have tried many differents configuration on my workflow.xml
<action name="start_traitement_load_01" cred="hcatauth">
<spark xmlns="uri:oozie:spark-action:1.0">
<resource-manager>yarnrm</resource-manager>
<name-node>${nameNode}</name-node>
<master>yarn-cluster</master>
<name>${instanceNameLoad}</name>
<class>${class}</class>
<jar>my/spark/jar</jar>
<spark-opts>
--conf spark.yarn.principal=${principal}
--conf spark.yarn.keytab=${userHomeDirectory}${keytabFile}
--conf spark.yarn.security.tokens.hadoopfs.enabled=true
--conf spark.yarn.security.tokens.hive.enabled=true
--files ${logFile}#log4j2.yml,${userHomeDirectory}${keytabFile}#${keytabFile}</spark-opts>
</spark>
<ok to="start_traitement_01"/>
<error to="fail"/>
</action>
according to this SO question, added this to credential :
<credentials>
<credential name='hcatauth' type='hcat'>
<property>
<name>hcat.metastore.uri</name>
<value>thrift://server1:9083,thrift://server2:9083</value>
</property>
<property>
<name>hcat.metastore.principal</name>
<value>hive/_HOST@COMMUN01.SVC</value>
</property>
</credential>
</credentials>
Also adding oozie.use.system.libpath=true
according to this SO question
in workflow:
<property>
<name>oozie.use.system.libpath</name>
<value>true</value>
</property>
I also tried to add my spark.yarn keytab and principal to workflow.xml:
<property>
<name>spark.yarn.keytab</name>
<value>path/to/keytab(tried both HDFS and FS)</value>
</property>
<property>
<name>spark.yarn.principal</name>
<value>principal@domain</value>
</property>
I dont think this is related to my problem, but i can't see my application in spark UI even if I configure it :
<spark-opts>
--keytab /path/to/keytab.keytab
--principal principal@domain
--conf spark.yarn.historyServer.address=http://server:18088
--conf spark.eventLog.dir=hdfs://HA-name/user/spark/applicationHistory
--conf spark.eventLog.enabled=true
</spark-opts>
spark : 2.4 oozie: 5.1