3

I am running a spark job through oozie spark action. The spark job uses hivecontext to perform some requirement. The cluster is configured with kerberos. When I submit the job using spark-submit form console, it is running successfully. But when I run the job from oozie, ending up with the following error.

18/03/18 03:34:16 INFO metastore: Trying to connect to metastore with URI thrift://localhost.local:9083
    18/03/18 03:34:16 ERROR TSaslTransport: SASL negotiation failure
    javax.security.sasl.SaslException: GSS initiate failed [Caused by GSSException: No valid credentials provided (Mechanism level: Failed to find any Kerberos tgt)]
            at com.sun.security.sasl.gsskerb.GssKrb5Client.evaluateChallenge(GssKrb5Client.java:211)
            at org.apache.thrift.transport.TSaslClientTransport.handleSaslStartMessage(TSaslClientTransport.java:94)

workflow.xml

<workflow-app xmlns="uri:oozie:workflow:0.5" name="workflow">
   <start to="analysis" />
   <!-- Bash script to do the spark-submit. The version numbers of these actions are magic. -->
   <action name="Analysis">
      <spark xmlns="uri:oozie:spark-action:0.1">
         <job-tracker>${jobTracker}</job-tracker>
         <name-node>${nameNode}</name-node>
         <master>${master}</master>
         <name>Analysis</name>
         <class>com.demo.analyzer</class>
         <jar>${appLib}</jar>
         <spark-opts>--jars ${sparkLib} --files ${config},${hivesite} --num-executors ${NoOfExecutors} --executor-cores ${ExecutorCores} --executor-memory ${ExecutorMemory} --driver-memory ${driverMemory}</spark-opts>
      </spark>
      <ok to="sendEmail" />
      <error to="fail" />
   </action>
   <action name="sendEmail">
      <email xmlns="uri:oozie:email-action:0.1">
         <to>${emailToAddress}</to>
         <subject>Output of workflow ${wf:id()}</subject>
         <body>Results from line count: ${wf:actionData('shellAction')['NumberOfLines']}</body>
      </email>
      <ok to="end" />
      <error to="end" />
   </action>
   <!-- You wish you'd ever get Oozie errors. -->
   <kill name="fail">
      <message>Bash action failed, error message[${wf:errorMessage(wf:lastErrorNode())}]</message>
   </kill>
   <end name="end" />
</workflow-app>

Do I need to configure anything related to Kerberos in workflow.xml ?. Am I missing anything here.

Any help appreciated.

Thanks in advance.

nagendra
  • 1,885
  • 3
  • 17
  • 27
  • You need to create a keytab file using kerberos, upload it in your workflow and bash script folder then include "kinit keytabfile credential" in your script. If you need more help, let me know – jose_bacoy Mar 19 '18 at 11:27
  • I have keytab file for hive,do I need to upload to hdfs where my workflow.xml file located ? – nagendra Mar 19 '18 at 12:52
  • Basically, I am submitting oozie job(spark action) with spark_user and hive metastore is in secured mode. I have keytab and principal for hive, shall I pass these to spark action, will that resolve the issue? – nagendra Mar 19 '18 at 13:20
  • I'm using a different approach then. My Oozie job will run a Bash script that contains the spark action inside the bash script. I put the workflow, bash script, jars and keytab on the same folder. Inside my bash script includes a paramter to use the keytab: "spark-submit --keytab --principal --etc etc" – jose_bacoy Mar 19 '18 at 13:48
  • Do you still want this approach? Oozie -> Bash -> Spark action – jose_bacoy Mar 19 '18 at 13:57
  • I am using spark action of oozie. shell action of oozie is not working for some reason. So, as suggested by you, I will try passing the keytab and principal to spark action – nagendra Mar 19 '18 at 16:05

2 Answers2

4

You need to add, hcat credentials for thrift uri in oozie workflow. This will enable successful authentication of metastore forthe thrift URI using Kerberos.

Add, below credentials tag in oozie workflow.

<credentials>
    <credential name="credhive" type="hcat">
        <property>
            <name>hcat.metastore.uri</name>
            <value>${thrift_uri}</value>
        </property>
        <property>
            <name>hcat.metastore.principal</name>
            <value>${principal}</value>
        </property>
    </credential>
</credentials>

And provide the credentials to the spark action as below:

<action name="Analysis" cred="credhive">
      <spark xmlns="uri:oozie:spark-action:0.1">
         <job-tracker>${jobTracker}</job-tracker>
         <name-node>${nameNode}</name-node>
         <master>${master}</master>
         <name>Analysis</name>
         <class>com.demo.analyzer</class>
         <jar>${appLib}</jar>
         <spark-opts>--jars ${sparkLib} --files ${config},${hivesite} --num-executors ${NoOfExecutors} --executor-cores ${ExecutorCores} --executor-memory ${ExecutorMemory} --driver-memory ${driverMemory}</spark-opts>
      </spark>
      <ok to="sendEmail" />
      <error to="fail" />
   </action>

The thrift_uri and principalcan be found in hive-site.xml. thrift_uri will be set in the hive-site.xml property:

<property>
    <name>hive.metastore.uris</name>
    <value>thrift://xxxxxx:9083</value>
  </property>

Also, principal will be set in hive-site.xml property:

 <property>
    <name>hive.metastore.kerberos.principal</name>
    <value>hive/_HOST@domain.COM</value>
  </property>
Amit Kumar
  • 1,544
  • 11
  • 23
  • I tried the same , but its failing with the below error. @nagendra , did the above solution worked for you. Please make sure that jars for your version of hive and hadoop are included in the paths passed to spark.sql.hive.metastore.jars. at org.apache.spark.sql.hive.client.IsolatedClientLoader.createClient(IsolatedClientLoader.scala:276) at org.apache.spark.sql.hive.HiveUtils$.newClientForMetadata(HiveUtils.scala:385) – vinu.m.19 Apr 11 '19 at 16:00
0

Upload your keytab in the server then refer this keytab file and credential as parameters in the spark-opts in your workflow. Let me know if it works or not. Thanks.

<spark-opts>--keytab nagendra.keytab --principal "nagendra@domain.com"
 --jars ${sparkLib} --files ${config},${hivesite} --num-executors ${NoOfExecutors} --executor-cores ${ExecutorCores} --executor-memory
 ${ExecutorMemory} --driver-memory ${driverMemory}</spark-opts>
jose_bacoy
  • 12,227
  • 1
  • 20
  • 38