I have a Databricks cluster running on some VMs. My organisation has a Hadoop cluster with a bunch a data in it that I want. I have no access to the Hadoop cluster, just a JDBC URL (all my permissions have been sorted out, they gave me just a URL).
I can open up a database management tool on my local machine (Dbeaver), and query Hive tables successfully.
However I am struggling to query Hive tables using Databricks and PySpark. It seems that to set the connection string for the HiveContext, I would normally write it in the hive-site.xml file. However Databricks doesn't give me this option.
I am on Hive 2.1.1 & Databricks 6.4 (includes Apache Spark 2.4.5, Scala 2.11)
Now I am at a loss on how to simply connect to my Hive database.
# Spark context sc is implicit in Databricks
hive_context = spark.HiveContext(sc)
# I want to be able to do something like
df = hive_context.sql("SELECT...")