Run spark sql query in AWS EMR

Question

I set up an AWS EMR cluster. I selected emr-6.0.0. The application selected was:

Spark: Spark 2.4.4 on Hadoop 3.2.1 YARN with Ganglia 3.7.2 and Zeppelin 0.9.0-SNAPSHOT

After that i created a jupyter notebook and attached it to the cluster. The problem is that the following lines of code in the notebook throw an error :

data_frame = spark.read.json("s3://transactions-bucket-demo/")
data_frame.createOrReplaceTempView("table")
spark.sql("SELECT * from table")

Error:

'java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;'
Traceback (most recent call last):
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/session.py", line 767, in sql
    return DataFrame(self._jsparkSession.sql(sqlQuery), self._wrapped)
  File "/usr/lib/spark/python/lib/py4j-0.10.7-src.zip/py4j/java_gateway.py", line 1257, in __call__
    answer, self.gateway_client, self.target_id, self.name)
  File "/usr/lib/spark/python/lib/pyspark.zip/pyspark/sql/utils.py", line 69, in deco
    raise AnalysisException(s.split(': ', 1)[1], stackTrace)
pyspark.sql.utils.AnalysisException: 'java.lang.RuntimeException: java.lang.RuntimeException: Unable to instantiate org.apache.hadoop.hive.ql.metadata.SessionHiveMetaStoreClient;'

How to resolve this error due to sql query in the notebook?

Which metastore are you using? HIVE or any external metastore like GLUE? Also do one thing, ssh into master and then type `hive -hiveconf hive.root.logger=DEBUG` then run a `show databases;` and see if there is any error. — Snigdhajyoti, Jun 09 '20 at 18:18
Have you [enabled hive support in your sparkSession](https://jaceklaskowski.gitbooks.io/mastering-spark-sql/spark-sql-SparkSession-Builder.html#enableHiveSupport)? You can read more on https://stackoverflow.com/a/48171845/7857701 — Snigdhajyoti, Jun 09 '20 at 20:01
also try to add enableHiveSupport() like https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html — Abdelrahman Maharek, Jun 11 '20 at 11:13

superdud · Answer 1 · 2023-08-05T06:49:28.497

0

I had the same problem, but with zeppelin notebook. I solved it by changing the zeppelin interpreter settings as follows

zeppelin.spark.useHiveContext   false

edited Aug 05 '23 at 06:49

answered Jul 29 '23 at 18:46

superdud

91
4

Run spark sql query in AWS EMR

1 Answers1