0

I use pyspark to read hive external table from hbase, and the table is successfully created, but when I use pyspark to read hive, it went wrong with this error:

spark.sql("use mydatabase")
user_rdd_list = spark.sql("select user_id, user_profile from ex_tbl limit 1")

Py4JJavaError: An error occurred while calling o124.showString. :java.lang.NoClassDefFoundError:org/apache/hbase/thirdparty/com/google/common/cache/CacheLoader

All configuration are:

[('spark.master', 'local'), ('spark.app.id', 'local-1571631446655'), ('spark.executor.memory', '2g'), ('spark.executor.id', 'driver'), ('spark.executor.cores', '2'), ('spark.app.name', 'RealTimeRecommendation'), ('spark.driver.host', 'iZ2ze85uv4ktko46vm8juvZ'), ('spark.sql.warehouse.dir', '/user/hive/warehouse'), ('spark.sql.catalogImplementation', 'hive'), ('spark.rdd.compress', 'True'), ('spark.executor.instances', '2'), ('spark.serializer.objectStreamReset', '100'), ('spark.submit.deployMode', 'client'), ('spark.driver.port', '33103'), ('spark.ui.showConsoleProgress', 'true')]

I have added the following jars to my SPARK_HOME/jars:

hbase-protocol-2.0.5.jar
hbase-client-2.0.5.jar
hbase-common-2.0.5.jar
hbase-server-2.0.5.jar
hive-hbase-handler-2.3.5.jar
metrics-core-3.1.5.jar
metrics-core-3.2.1.jar
guava-11.0.2.jar
guava-14.0.1.jar

my spark version is 2.4.3, how can I deal with it?

littlely
  • 1,368
  • 3
  • 18
  • 36
  • Since you have not shared how you have added the jars. Please share the output of ''print(spark.sparkContext._conf.getAll())'' , check if find all your jars in the following property "'spark.yarn.dist.jars'" – Neha Jirafe Oct 21 '19 at 04:37
  • I have modified. – littlely Oct 21 '19 at 05:37
  • I dont see the jars in the configuration. In case you are running spark on yarn , the jars must be distributed to all the nodes to be accessible. Generally the "'spark.yarn.dist.jars'" will list the distributed jars. Try invoking like this pyspark --jars /hbase-protocol-2.0.5.jar,/hbase-client-2.0.5.jar etc I use it like pyspark --jars /opt/jars/postgresql-42.2.5.jar,/opt/jars/mysql-connector-java-5.1.47.jar – Neha Jirafe Oct 21 '19 at 06:43
  • I run it in jupyter. – littlely Oct 21 '19 at 06:49
  • Please refer to https://stackoverflow.com/questions/35946868/adding-custom-jars-to-pyspark-in-jupyter-notebook – Neha Jirafe Oct 21 '19 at 06:55

0 Answers0