My hive table resides in EMR cluster. I have created SSH tunnel :
ssh -L 8888:localhost:8888 -i atlas-emr-xx.pem hadoop@ec2-aa-bbb-ccc-ddd.us-west-2.compute.amazonaws.com
I am able to create and access the hive tables thru HUE from http://localhost:8888/
Now, I need to access the hive tables from Python and the code executes from my on premise machine.
My code :
from pyhive import hive
import pandas as pd
conn = hive.Connection(host='localhost', port=8888, username='admin',auth='NOSASL')
df = pd.read_sql("SELECT * FROM atlas_emr.us_disease limit 10", conn)
print(df.head())
Error i get :
File "C:\Users\User\PycharmProjects\Hive\venv\lib\site-packages\thrift\transport\TSocket.py", line 143, in read
message='TSocket read 0 bytes')
thrift.transport.TTransport.TTransportException: TSocket read 0 bytes
And if I modify the code :
conn = hive.Connection(host='localhost', port=8888, username='admin')
The error I get is :
thrift.transport.TTransport.TTransportException: Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: Unable to find a callback: 2'
I have followed all the codes mentioned in How to Access Hive via Python?. But no luck
Let me know if it is possible to access Hive created in EMR cluster. Thank you.