0

My hive table resides in EMR cluster. I have created SSH tunnel :

ssh -L 8888:localhost:8888 -i atlas-emr-xx.pem hadoop@ec2-aa-bbb-ccc-ddd.us-west-2.compute.amazonaws.com

I am able to create and access the hive tables thru HUE from http://localhost:8888/

Now, I need to access the hive tables from Python and the code executes from my on premise machine.

My code :

from pyhive import hive
import pandas as pd
conn = hive.Connection(host='localhost', port=8888, username='admin',auth='NOSASL')
df = pd.read_sql("SELECT * FROM atlas_emr.us_disease limit 10", conn)
print(df.head())

Error i get :

  File "C:\Users\User\PycharmProjects\Hive\venv\lib\site-packages\thrift\transport\TSocket.py", line 143, in read
    message='TSocket read 0 bytes')
thrift.transport.TTransport.TTransportException: TSocket read 0 bytes

And if I modify the code :

conn = hive.Connection(host='localhost', port=8888, username='admin')

The error I get is :

thrift.transport.TTransport.TTransportException: Could not start SASL: b'Error in sasl_client_start (-4) SASL(-4): no mechanism available: Unable to find a callback: 2'

I have followed all the codes mentioned in How to Access Hive via Python?. But no luck

Let me know if it is possible to access Hive created in EMR cluster. Thank you.

usr_lal123
  • 650
  • 12
  • 28

1 Answers1

0

Can you try https://docs.aws.amazon.com/athena/latest/ug/connect-to-data-source-hive.html to connect to hive from EMR

  • To be precise, I am copying snowflake tables' structure and creating those tables in Hive. But Athena does not support create tables. – usr_lal123 Mar 30 '21 at 03:57