0

I am trying to do a query from HIVE using PySpark on Python connected from SSH to an external server and I am failing (1rst picture).

Details:

IDE:
VisualStudio Code

Bash Profile:
if [ -f~ /.bashrc ]; then
        . ~/.bashrc
fi
PATH = $PATH:$HOME/.local/bin:$HOME/bin
export HADOOP_HOME=/usr/hdp/2.6.5.0-292/hadoop
export HDP_VERSION=2.6.5.0-292
export SPARK_HOME=/usr/hdp/current/spark2-client
export SPARK_MAJOR_VERSION=2
export SPARK_CONF_DIR=/etc/spark2/conf
export PATH

It is strange because if I open a bash terminal and write:

export HADOOP_HOME=/usr/hdp/2.6.5.0-292/hadoop
export HDP_VERSION=2.6.5.0-292
export SPARK_HOME=/usr/hdp/current/spark2-client
export SPARK_MAJOR_VERSION=2
export SPARK_CONF_DIR=/etc/spark2/conf
pyspark --conf name="test_py37" --conf spark.ui.enabled=false --driver-memory 10g

To open PySpark (on the bash terminal) I can get a correct query (2nd picture).

In the 1rst picture, I also try replacing the lines 10 to 13 with 15 to 17 (SparkContext and SparkSession) and I get the same error. All this picture (1rst) using Python 3.7

Can you help me, please?

Best regards, Mirko.

PS: Picture 01 = Red squares / picture 02 = White squares with red borders Have a nice day!

PYTHON CODE:

from pyspark.sql import *
from pyspark import *
import os
conf = SparkConf().setAppName("test_py37")
conf = SparkConf()
conf.set('spark.jars', '/usr/hdp/current/sqoop-client/lib/ojdbc7.jar,/usr/hdp/current/sqoop-client/lib/terajdbc4.jar,/usr/hdp/current/sqoop-client/lib/tdgssconfig.jar')
 conf.set('spark.port.maxRetries', '100')
 conf.set('spark.driver.memory', '10G')
 conf.set('spark.ui.enabled', "false")
 sc = SparkContext(conf = conf)  
 sc.setLogLevel('ERROR')  
 df = HiveContext(sc)
 df.sql("select * from database.table limit 10").show()

2ND TRY (same error):

sc = SparkSession.builder.config(conf = conf).enableHiveSupport().getOrCreate() 
df = SQLContext(sc)
df.sql("select * from database.table limit 10").show()

ERROR:

pyspark.sql.utils.AnalysisException: "Table or view not found: 'database/schema' . 'table'; line 1 pos 4;\n'GlobalLimit 10\n+- 'LocalLimit 10 \n +- 'Project [*]\n +- 'UnresolvedRelation 'database/schema'.'table'\n"

Picture 01 Picture 02

ADDITIONAL INFO:

1.- I also try configuring the warehouse_location (ref: https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html) and newly same error.

2.- In both cases (bash terminal and Python terminal) show an empty table when using 'show tables'.

Other links:

  1. Schema error after altering hive table with pyspark

  2. Query HIVE table in pyspark

  3. hive table does not exist in spark job

0 Answers0