3

when working with HDP 2.5 with spark 1.6.2 we used Hive with Tez as its execution engine and it worked.

But when we moved to HDP 2.6 with spark 2.1.0, Hive didn't work with Tez as its execution engine, and the following exception was thrown when the DataFrame.saveAsTable API was called:

java.lang.NoClassDefFoundError: org/apache/tez/dag/api/SessionNotRunning at org.apache.hadoop.hive.ql.session.SessionState.start(SessionState.java:529) at org.apache.spark.sql.hive.client.HiveClientImpl.<init> HiveClientImpl.scala:188)

after looking at the answer to this question, we switched hive execution engine to MR (MapReduce) instead of Tez and it worked.

However, we'd like to work with Hive on Tez. what's required to solve the above exception in order for Hive on Tez to work?

Community
  • 1
  • 1
Elad Eldor
  • 803
  • 1
  • 12
  • 22
  • My two cents: check that the `hive-site.xml` used by Spark has been cleaned up from all TEZ configuration properties. – Samson Scharfrichter May 03 '17 at 21:45
  • but we want to use TEZ. my question was if we can use Hive on TEZ with HDP 2.6 (in HDP 2.5 it worked but in HDP 2.6 it doesn't). – Elad Eldor May 07 '17 at 07:16
  • Spark does **NOT** use TEZ. Spark does **NOT** use MR. Spark has its own execution engine. So the error you see has to come from (useless) init parameters when it connects to Hive Metastore. – Samson Scharfrichter May 07 '17 at 14:29

1 Answers1

1

I had the same issue when the spark job was running in YARN CLUSTER mode and that was resolved when correct hive-site.xml was added to ( add to spark-default configuration) " spark.yarn.dist.files "

Basically there are two different hive-site.xml files, one is for hive configuration : /usr/hdp/current/hive-client/conf/hive-site.xml The other one is lighter version for spark ( had the details only for spark to work with hive) : /etc/spark//0/hive-site.xml ( please check the path once for your setup)

we need to use the second file for spark.yarn.dist.files.

Vijayanand
  • 470
  • 4
  • 10
  • It seems that spark depends on Hive while hive on tez. When in cluster mode, Hive often cannot find the tez dependencies. – Shuai Liu Mar 24 '19 at 10:25