I am trying to do a query from HIVE using PySpark on Python connected from SSH to an external server and I am failing (1rst picture).
Details:
IDE:
VisualStudio Code
Bash Profile:
if [ -f~ /.bashrc ]; then
. ~/.bashrc
fi
PATH = $PATH:$HOME/.local/bin:$HOME/bin
export HADOOP_HOME=/usr/hdp/2.6.5.0-292/hadoop
export HDP_VERSION=2.6.5.0-292
export SPARK_HOME=/usr/hdp/current/spark2-client
export SPARK_MAJOR_VERSION=2
export SPARK_CONF_DIR=/etc/spark2/conf
export PATH
It is strange because if I open a bash terminal and write:
export HADOOP_HOME=/usr/hdp/2.6.5.0-292/hadoop
export HDP_VERSION=2.6.5.0-292
export SPARK_HOME=/usr/hdp/current/spark2-client
export SPARK_MAJOR_VERSION=2
export SPARK_CONF_DIR=/etc/spark2/conf
pyspark --conf name="test_py37" --conf spark.ui.enabled=false --driver-memory 10g
To open PySpark (on the bash terminal) I can get a correct query (2nd picture).
In the 1rst picture, I also try replacing the lines 10 to 13 with 15 to 17 (SparkContext and SparkSession) and I get the same error. All this picture (1rst) using Python 3.7
Can you help me, please?
Best regards, Mirko.
PS: Picture 01 = Red squares / picture 02 = White squares with red borders Have a nice day!
PYTHON CODE:
from pyspark.sql import *
from pyspark import *
import os
conf = SparkConf().setAppName("test_py37")
conf = SparkConf()
conf.set('spark.jars', '/usr/hdp/current/sqoop-client/lib/ojdbc7.jar,/usr/hdp/current/sqoop-client/lib/terajdbc4.jar,/usr/hdp/current/sqoop-client/lib/tdgssconfig.jar')
conf.set('spark.port.maxRetries', '100')
conf.set('spark.driver.memory', '10G')
conf.set('spark.ui.enabled', "false")
sc = SparkContext(conf = conf)
sc.setLogLevel('ERROR')
df = HiveContext(sc)
df.sql("select * from database.table limit 10").show()
2ND TRY (same error):
sc = SparkSession.builder.config(conf = conf).enableHiveSupport().getOrCreate()
df = SQLContext(sc)
df.sql("select * from database.table limit 10").show()
ERROR:
pyspark.sql.utils.AnalysisException: "Table or view not found: 'database/schema' . 'table'; line 1 pos 4;\n'GlobalLimit 10\n+- 'LocalLimit 10 \n +- 'Project [*]\n +- 'UnresolvedRelation 'database/schema'.'table'\n"
ADDITIONAL INFO:
1.- I also try configuring the warehouse_location (ref: https://spark.apache.org/docs/latest/sql-data-sources-hive-tables.html) and newly same error.
2.- In both cases (bash terminal and Python terminal) show an empty table when using 'show tables'.
Other links: