How do I connect a local Spark instance to Hive tables on a kerberized remote server?

Question

Context

I want to use Spark 2 for some elementary ETL, but our Hadoop admins have very different priorities, and aren't able to install it, for the time being.

Problem

Simply stated, I want to know if it's possible to configure a Spark session, running on my local computer, to:

Connect to the Hadoop cluster (without Spark 2 installed)
Authenticate to the cluster, so that I can access its Hive tables
Read data from its Hive tables to my local machine
Process and transform the data on my local machine
Write the result to a different remote RDBMS system (eg, PostgreSQL)

I do not have server root user access. Admin policy prevents these systems from communicating directly with each other, but my local machine can read from - and write to - either.

These previously-answered questions have not provided a working solution:

Many thanks if you can help! (Even if it's just saying, "No, you have to have Spark installed on the Hadoop cluster to read its data." I just need to know.)

Does your cluster offer yarn? Do you have shell access? If yes spark is just a jar you can run on yarn. No installation required. But the tricky part is to get the class path right. — Georg Heiler, Apr 25 '19 at 04:16
Yes and yes. Tell me more about what must be on the classpath. — Leo Orientis, Apr 25 '19 at 04:22

score 1 · Answer 1 · answered Apr 25 '19 at 03:24

I am afraid if you could do that. One thing you can try is connecting hive via jdbc connection and read it in your local spark cluster. You need to open firewall access to the port 10000.

val sc = spark.sparkContext
  val sqlContext = spark.sqlContext
  val driverName = "org.apache.hive.jdbc.HiveDriver"
  Class.forName(driverName)
  val df = spark.read
    .format("jdbc")
    .option("url", "jdbc:hive2://localhost:10000/default")
    .option("dbtable", "clicks_json")
    .load()

if you have kerberous for authentication use jdbc:hive2://server.dom.com:10000/mydatabase;user=someuser@PRINCIPAL.DOM.COM;principal=hive/principal.dom.com@PRINCIPAL.DOM.COM

How do I connect a local Spark instance to Hive tables on a kerberized remote server?

1 Answers1