4

Used this link to try to connect to a remote hive. Below is the code used. The error msg received is also given below

How to Access Hive via Python?

Code

   from pyhive import hive
    conn = hive.Connection(host="10.111.22.11", port=10000, username="user1" ,database="default")

Error msg

Could not connect to any of [('10.111.22.11', 10000)]
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/opt/anaconda3/lib/python3.6/site-packages/pyhive/hive.py", line 131, in __init__
    self._transport.open()
  File "/opt/anaconda3/lib/python3.6/site-packages/thrift_sasl/__init__.py", line 61, in open
    self._trans.open()
  File "/opt/anaconda3/lib/python3.6/site-packages/thrift/transport/TSocket.py",line 113, in open
    raise TTransportException(TTransportException.NOT_OPEN, msg)
thrift.transport.TTransport.TTransportException: Could not connect to any of [('10.111.22.11', 10000)]

What are the other requirements for successful connection? I am able to connect to the server directly (using putty) and run hive. But when tried from another server X i get this error. Also i can ping the hive server from server X.

Could the port number be the problem? How do i check the correct port number?

As discussed in the below answer i tried to start hiveserver2 . But the command doesnt seem to work. Any help is really appreciated.

Also the port i see in the log when i execute a query from hive shell is 8088. wonder if this should be the port instead of 10000(both did not work anyway)

Community
  • 1
  • 1
kten
  • 445
  • 4
  • 13
  • 26

5 Answers5

5

Could not make it work using pyhive. Had to use paramiko insted below is the sample code

import os
import paramiko
import time 

ssh = paramiko.SSHClient()
ssh.set_missing_host_key_policy(paramiko.AutoAddPolicy())
ssh.load_host_keys(os.path.expanduser(os.path.join("~", ".ssh", "known_hosts")))
ssh.connect('1.1.1.1', username='uid', password='pwd')
sshin, sshout, ssherr= ssh.exec_command('hive -e "create table test(key varchar(10),keyval varchar(200))"')
kten
  • 445
  • 4
  • 13
  • 26
1

HiveServer2 process must be started in your remote Hive host. 10000 is the default port number.

Use this command to start HiveServer2.

$HIVE_HOME/bin/hiveserver2 
franklinsijo
  • 17,784
  • 4
  • 45
  • 63
  • the hive server is a large production system. So i assume it is running always. Can you tell me how to check if it is running? – kten May 18 '17 at 08:03
  • You can try to telnet the 10000 port. `telnet 10.111.22.11 10000` – franklinsijo May 18 '17 at 08:06
  • from the same server telnet gives connection refused error but at the same time i am able to run hive – kten May 18 '17 at 08:11
  • You will be able to run queries thru hive shell without having the hiveserver running. But to use pyhive, hiveserver2 must be started. Try starting it using the command in the answer. – franklinsijo May 18 '17 at 08:13
  • sorry to say. but i m unable to figure out this path. $HIVE_HOME is not getting resolved – kten May 18 '17 at 10:01
  • it returns "no hiveserver2 in (different paths)". So is it not configured for remote connection? – kten May 18 '17 at 10:19
  • No, you should be running this command in your remote hive host. – franklinsijo May 18 '17 at 10:20
  • Still no success. tried which command in remote host itself where hive resides. So it still looks like not configured for remote connections. I am trying now with paramiko ssh client. where i can call the hive command. but the disadvantage is each call is a separate session which prevents me from using a table just created. – kten May 22 '17 at 15:03
  • 1
    I know it has been very long, using paramiko in place of pyhive is a terrible choice. If you have `hive` executable in your host, you will be able to start `hiveserver2` as well. Try `hive --service hiveserver2` on your remote host. – franklinsijo Jun 13 '17 at 16:12
1

For PyHive to work, Hive Server 2.0 should be running using transport mode as “binary”. If it's running with transport as Http change it to "binary".

Arnab Biswas
  • 4,495
  • 3
  • 42
  • 60
0

Please try below code to access remote hive table using pyhive:

from pyhive import hive
import pandas as pd

#Create Hive connection 
conn = hive.Connection(host="10.111.22.11", port=10000, username="user1")

# Read Hive table and Create pandas dataframe
df = pd.read_sql("SELECT * FROM db_Name.table_Name limit 10", conn)
print(df.head())
user07
  • 658
  • 3
  • 13
  • 27
  • dosnt work. getting the error `thriftpy.transport.TTransportException: TTransportException(type=1, message="Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: No worthy mechs found` – sjd Feb 14 '18 at 10:24
  • 1
    @sjd : try installing thrift and thrift_sasl packages too with pip. – vuvu Jan 09 '19 at 22:20
0

First hive.server2.transport.mode should be set to binary

then use the following code

from pyhive import hive

host_name = "10.3.141.44"
port = 10000
database="ncc"


def hiveconnection(host_name, port, database):
    conn = hive.Connection(host=host_name, port=port,
                           database=database, auth='NOSASL')
    cur = conn.cursor()
    cur.execute('select * from registry')
    result = cur.fetchall()
    return result
output = hiveconnection(host_name, port, database)
print(output)