3

When I tried to use Spark-Sql against Hive, the error like below is thrown.

Exception in thread "main" java.lang.NoSuchFieldError: HIVE_STATS_JDBC_TIMEOUT
        at org.apache.spark.sql.hive.HiveUtils$.formatTimeVarsForHiveClient(HiveUtils.scala:204)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver$.main(SparkSQLCLIDriver.scala:90)
        at org.apache.spark.sql.hive.thriftserver.SparkSQLCLIDriver.main(SparkSQLCLIDriver.scala)

As per SO thread hive-stats-jdbc-timeout-for-hive-queries-in-spark andspark-on-hive-sql-query-error-nosuchfielderror-hive-stats-jdbc-timeout, this issue occurs when you're using specific version of Spark and Hive, actually, if you want to use latest version spark like 2.4.3 and latest Hive like 3.1.1, it can't be skipped.

We can check this community thread for details, https://issues.apache.org/jira/browse/SPARK-13446, no update since Feb.2019.

So so you know any update about this issue? If we want to skip it ourselves in source level, any clue about how to make it?

Thanks for your help in advance.

Eugene
  • 10,627
  • 5
  • 49
  • 67
  • 1
    When using a different version of Hive other than 1.2.1 with Spark, we need to set two properties. 1. `spark.sql.hive.metastore.version` - Should contain the version of metastore that we are connecting to 2. `spark.sql.hive.metastore.jars` - the Hive client jars of the same version in item 1. Can you please let me know the values of both these properties in your case – DaRkMaN Jul 31 '19 at 03:43
  • Thanks for the comment. The metastore jar under $SPARK_HOME/jars is hive-metastore-3.1.1.jar. For those two properties, I haven't set them explicitly. Is the setting critical? If sure, where should I set them? What's the value of it should be? – Eugene Jul 31 '19 at 03:54
  • 1
    If we want to use a different version of Hive client, we need to set these properties. That way, it critical. 1. where to set-> in spark-defaults.conf or add it as arguments to spark-submit command using `--conf =` 2. what values to set -> in ur case `spark.sql.hive.metastore.version` should be 3.1.1, and `spark.sql.hive.metastore.jars` should have colon-separated list of 3.1.1 jars. and Spark support 3.1.1 as metastore from spark 3.0(https://jira.apache.org/jira/browse/SPARK-24360). – DaRkMaN Jul 31 '19 at 04:39
  • For more info on the properties, please check https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html#interacting-with-different-versions-of-hive-metastore – DaRkMaN Jul 31 '19 at 04:39
  • The latest version of Spark is 2.4.3 from official website. According to your suggestion, the only thing I can do is to revert the Hive version? – Eugene Jul 31 '19 at 05:26
  • Yes. If u are tied to 3.1.1, u can backport the changes to spark-2.4.3 and create a build from it. – DaRkMaN Jul 31 '19 at 05:29
  • I'm a little confused here, do you want to say that I'd better build spark-2.4.3 that's compatible to hadoop,hive 3.x version? – Eugene Jul 31 '19 at 05:34
  • If you have a tight dependency on Hive 3.1.1 and Spark-2.4.3 specifically, you could backport the changes in a private Spark fork and create a new build and use it. This change does not affect Hadoop etc and will be a Spark only change. – DaRkMaN Jul 31 '19 at 05:38

1 Answers1

1

Support for using Hive 3.1.1 will be available only from Spark 3.0.0(Yet to be released).
Jira - https://jira.apache.org/jira/browse/SPARK-24360

DaRkMaN
  • 1,014
  • 6
  • 9
  • Thanks for your information. However, I skipped the error by using spark2.3, hadoop 2.7 as spark's runtime. The version of aws related library does matters here. I used aws-java-sdk-1.7.4.jar and hadoop-aws-2.7.3.jar. As per my current understanding, the Hive version should be compatible with the hadoop spark used, besides that, it doesn't that matter. In other words, as long as the hive is compatible with hadoop 2.7, it can be used in above scenario. – Eugene Aug 04 '19 at 02:42