My question is related to the existing thread
But we are on HDP 2.6.3 and Ambari 2.6.1.5
Question: We are trying to access the HIVE tables data from SPARK2.2
The command:
spark-submit --class com.virtuslab.sparksql.MainClass --master yarn --deploy-mode client /tmp/spark-hive-test/spark_sql_under_the_hood-spark2.2.0.jar
In the client mode it works --> please note we haven't passed the --files or --conf spark.yarn.dist.files
spark-submit --class com.virtuslab.sparksql.MainClass --master yarn --deploy-mode cluster /tmp/spark-hive-test/spark_sql_under_the_hood-spark2.2.0.jar
In the cluster mode it fails with:
diagnostics: User class threw exception:
org.apache.spark.sql.catalyst.analysis.NoSuchTableException: Table or view
'xyz' not found in database 'qwerty';
ApplicationMaster host: 121.121.121.121
ApplicationMaster RPC port: 0
queue: default
start time: 1523616607943
final status: FAILED
tracking URL: https://managenode002xxserver:8090/proxy/application_1523374609937_10224/
user: abc123
Exception in thread "main" org.apache.spark.SparkException: Application
application_1523374609937_10224 finished with failed status
at org.apache.spark.deploy.yarn.Client.run(Client.scala:1187)
at org.apache.spark.deploy.yarn.Client$.main(Client.scala:1233)
at org.apache.spark.deploy.yarn.Client.main(Client.scala)
at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
at
sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
at java.lang.reflect.Method.invoke(Method.java:497)
at org.apache.spark.deploy.SparkSubmit$.org$apache$spark$deploy$SparkSubmit$runMain(SparkSubmit.scala:782)
at org.apache.spark.deploy.SparkSubmit$.doRunMain$1(SparkSubmit.scala:180)
at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:205)
at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:119)
at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Please note we have not used --files or --conf spark.yarn.dist.files
But the same works with this:
spark-submit --class com.virtuslab.sparksql.MainClass --master yarn --deploy-mode cluster --files /etc/spark2/conf/hive-site.xml /tmp/spark-hive-test/spark_sql_under_the_hood-spark2.2.0.jar
And the result is seen
Is there any BUG that it is not allowing SPARK not to pick up /etc/spark2/conf while run in YARN CLUSTER mode.
Note: The /etc/spark2/conf contains hive-site.xml on all the nodes of the cluster.