Querying Hive data located on a remote cluster using Spark

Question

I'm trying to write a simple Scala code that queries Hive data located on a remote cluster. My code will be deployed to a clusterA but has to query a Hive table located on clusterB. I'm developing this in my local Eclipse and getting the following error

org.apache.spark.sql.AnalysisException: Table not found: `<mydatabase>`.`<mytable>`;

The relevant part of my code is below

    val conf = new SparkConf().setAppName("Xing")
    .setMaster("local[*]")
    conf.set("hive.metastore.uris","thrift://<clusterB url>:10000")
    val sc = SparkContext.getOrCreate(conf)
    val hc = new HiveContext(sc)
    val df = hc.sql("select * from <mydatabase>.<mytable>")

I suspect it is a configuration issue but I may be wrong. Any advise would be greatly appreciated.

Can you run beeline and access the same HiveServer/database/table? — , Nov 23 '16 at 15:38
I can query this table using Hive JDBC with no problems. This cluster has Kerberos security setup. I was trying to set the same properties in SparkConf but had the same error. These are the properties I'm setting: conf.set("login.user","") conf.set("keytab.file", "") conf.set("sun.security.krb5.debug","false") conf.set("java.security.krb5.conf","") conf.set("java.library.path","") conf.set("hadoop.home.dir","") conf.set("hadoop.security.authentication","kerberos") — Michael D, Nov 23 '16 at 16:44

score 0 · Answer 1 · answered Nov 22 '16 at 21:19

0

The port in the metastore URL should be 9083, unless you purposely changed it. 10000 is for hiveserver2.

answered Nov 22 '16 at 21:19

Lokesh Yadav

958
2
9
20

Thank you for pointing this out. However, I am getting the same error with either port – Michael D Nov 22 '16 at 21:55

Querying Hive data located on a remote cluster using Spark

1 Answers1