Context
I want to use Spark 2 for some elementary ETL, but our Hadoop admins have very different priorities, and aren't able to install it, for the time being.
Problem
Simply stated, I want to know if it's possible to configure a Spark session, running on my local computer, to:
- Connect to the Hadoop cluster (without Spark 2 installed)
- Authenticate to the cluster, so that I can access its Hive tables
- Read data from its Hive tables to my local machine
- Process and transform the data on my local machine
- Write the result to a different remote RDBMS system (eg, PostgreSQL)
I do not have server root user access. Admin policy prevents these systems from communicating directly with each other, but my local machine can read from - and write to - either.
These previously-answered questions have not provided a working solution:
- How to connect to remote hive server from spark
- How to connect to a Hive metastore programmatically in SparkSql
Many thanks if you can help! (Even if it's just saying, "No, you have to have Spark installed on the Hadoop cluster to read its data." I just need to know.)