1

I am trying to connect to Hive using beeline on an EMR cluster (Kerberos enabled) and am wondering why I'd run a kinit (using my user account) and then the following:

beeline -u "jdbc:hive2://localhost:10000/default;principal=hive/_HOST@REALM"

The part that confuses me is the principal above. Why do we use "principal=hive/_HOST@REALM" (which from what I've read is the Hive service principal) when I've authenticated with my user account using the kinit in the previous command?

Will I be running queries against the Hive service principal or my user account? Do all users use the Hive service principal when using beeline? Is there any reason behind this?

Link for further context: Connecting to Hive via Beeline using Kerberos keytab

Brandon
  • 375
  • 2
  • 16

1 Answers1

1

The principal= option on that JDBC URL actually refers to the service principal (SPN) i.e. what you need to connect to. It's admittedly ambiguous and confusing.

kinit authenticates your user principal (UPN), creating a "ticket-granting ticket" (TGT) which is dumped in the ticket cache.
Later the JDBC client (or HTTP client, or Hive Metastore Java client, or HDFS Java client, whatever) will use the TGT to request a service ticket for the appropriate service type on the appropriate host; for some reason Java never puts that service ticket in the cache (unlike curl or Python, which use a C library, like kinit).

SPNs are normally defined in Hadoop configuration files named ***-site.xml which are consumed by the Hadoop client libraries.
But... a JDBC driver is supposed to be stand-alone, not have dependencies on external libs or config files, and get all its connection params from the URL. That's why you have to stuff the SPN explicitly on your URL. Duh.

Samson Scharfrichter
  • 8,884
  • 1
  • 17
  • 36
  • So does this mean that I will connect using the Hive SPN, but my queries will be run against the user account I've authenticated with? – Brandon Feb 10 '19 at 23:53
  • No (if I understand correctly what you mean by "using"). You connect as your UPN, to the SPN. And yes or no, your queries will run under your UPN **or** under the Hive service account but with your privs -- depending on the security model that HiveServer2 uses _(e.g. in CDH, Sentry enforces the 2nd approach)_ – Samson Scharfrichter Feb 11 '19 at 10:12