I'm connecting Hive use pyhs2. But the Hive server required Kerberos authentication. Anyone knows how to convert the JDBC string to pyhs2 parameter? Like:
jdbc:hive2://biclient2.server.163.org:10000/default;principal=hive/app-20.photo.163.org@HADOOP.HZ.NETEASE.COM?mapred.job.queue.name=default
Asked
Active
Viewed 1.4k times
5

leeyiw
- 414
- 2
- 6
- 14
2 Answers
7
I think it will be something like this:
pyhs2.connect(host='biclient2.server.163.org',
port=10000,
authMechanism="KERBEROS",
password="something",
user='your_user@HADOOP.HZ.NETEASE.COM')
I'm also doing the same, I still not succeed, but at least having a meaningful errorcode: (Server hive/xxx@yyy.COM not found in Kerberos database)

kecso
- 2,387
- 2
- 18
- 29
-
I'm succeed use what you said, and add a `configuration` parameter like: `conn_config = {'krb_host': 'app-20.photo.163.org', 'krb_service': 'hive'}` – leeyiw May 13 '15 at 06:50
-
From standpoint of security, in Kerberos authentication username/password should be taken from an active Kerberos ticket or from a keytab. See @pele88 answer below which goes with the former option. – Tagar Sep 26 '16 at 14:29
-
1plus one, but pyhs2 is no longer maintained http://stackoverflow.com/a/38666630/470583 – Tagar Sep 26 '16 at 22:29
2
This connection string will work as long as the user running the script has a valid kerberos ticket:
import pyhs2
with pyhs2.connect(host='biclient2.server.163.org',
port=10000,
authMechanism="KERBEROS") as conn:
with conn.cursor() as cur:
print cur.getDatabases()
Username, password and any other configuration parameters are not passed through the KDC.

pele88
- 802
- 2
- 8
- 16
-
plus one, but pyhs2 is no longer maintained http://stackoverflow.com/a/38666630/470583 – Tagar Sep 26 '16 at 22:28
-
@Ruslan Yes I'm aware but couldn't get any alternatives working with a kerberised cluster - I've tried both Impyla and PyHive. Have you? – pele88 Sep 27 '16 at 08:19