2

I set up an AWS EMR cluster. I selected emr-6.0.0. The application selected was:

Spark: Spark 2.4.4 on Hadoop 3.2.1 YARN with Ganglia 3.7.2 and Zeppelin 0.9.0-SNAPSHOT

After that i created a jupyter notebook and attached it to the cluster. The problem is that the following lines of code in the notebook throw an error :

data_frame = spark.read.json("s3://transactions-bucket-demo/")
data_frame.createOrReplaceTempView("table")
spark.sql("SELECT * from table")

Error:

'java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;'
Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 767, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
    raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: 'java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;'

How to resolve this error due to sql query in the notebook?

mightyMouse
  • 658
  • 15
  • 23
  • Which metastore are you using? HIVE or any external metastore like GLUE? Also do one thing, ssh into master and then type `hive -hiveconf hive.root.logger=DEBUG` then run a `show databases;` and see if there is any error. – Snigdhajyoti Jun 09 '20 at 18:18
  • how to know about which metadatastore i am using? – mightyMouse Jun 09 '20 at 19:19
  • Have you [enabled hive support in your sparkSession](https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-SparkSession-Builder.html#enableHiveSupport)? You can read more on https://stackoverflow.com/a/48171845/7857701 – Snigdhajyoti Jun 09 '20 at 20:01
  • what is the output of spark.sql("show databases;").show() ? – Abdelrahman Maharek Jun 11 '20 at 11:12
  • also try to add enableHiveSupport() like https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html – Abdelrahman Maharek Jun 11 '20 at 11:13

1 Answers1

0

I had the same problem, but with zeppelin notebook. I solved it by changing the zeppelin interpreter settings as follows

zeppelin.spark.useHiveContext   false
superdud
  • 91
  • 4