I have some data in HDFS,i need to access that data using python,can anyone tell me how data is accessed from hive using python?
3 Answers
To install you'll need these libraries:
pip install sasl
pip install thrift
pip install thrift-sasl
pip install PyHive
If you're on Linux, you may need to install SASL separately before running the above. Install the package libsasl2-dev
using apt-get
or yum
or whatever package manager. For Windows there are some options on GNU.org. On a Mac SASL should be available if you've installed xcode developer tools (xcode-select --install
)
After installation, you can execute a hive query like this:
from pyhive import hive
conn = hive.Connection(host="YOUR_HIVE_HOST", port=PORT, username="YOU")
Now that you have the hive connection, you have options how to use it. You can just straight-up query:
cursor = conn.cursor()
cursor.execute("SELECT cool_stuff FROM hive_table")
for result in cursor.fetchall():
use_result(result)
...or to use the connection to make a Pandas dataframe:
import pandas as pd
df = pd.read_sql("SELECT cool_stuff FROM hive_table", conn)

- 5,844
- 2
- 26
- 31
-
2"Could not start SASL: %s" % self.sasl.getError()" - under Windows 2008 R2, python 3.6. How solve this problem? – Nikolay Baranenko Feb 04 '18 at 22:00
-
1this is thrift issue, I was facing the same issue, on Linux environment, it works fine. – Abdul Majeed May 13 '18 at 08:27
-
Confirming the windows issue and that it does work on Linux. I use Windows 10 for development (PyCharm). I could not "pip install pyhive[hive]" there; the dependencies fail. But if you just "pip install pyhive" (or whatever, using PyCharm's package manager), then you can develop against it. When I import PyHive I wrap it in a if os.name != "nt": statement so it is skipped on Windows. Everything installs and works fine on Linux though (Centos 7). – John Humphreys Dec 11 '18 at 14:25
You can use hive library for access hive from python,for that you want to import hive Class from hive import ThriftHive
Below the Example
import sys
from hive import ThriftHive
from hive.ttypes import HiveServerException
from thrift import Thrift
from thrift.transport import TSocket
from thrift.transport import TTransport
from thrift.protocol import TBinaryProtocol
try:
transport = TSocket.TSocket('localhost', 10000)
transport = TTransport.TBufferedTransport(transport)
protocol = TBinaryProtocol.TBinaryProtocol(transport)
client = ThriftHive.Client(protocol)
transport.open()
client.execute("CREATE TABLE r(a STRING, b INT, c DOUBLE)")
client.execute("LOAD TABLE LOCAL INPATH '/path' INTO TABLE r")
client.execute("SELECT * FROM r")
while (1):
row = client.fetchOne()
if (row == None):
break
print row
client.execute("SELECT * FROM r")
print client.fetchAll()
transport.close()
except Thrift.TException, tx:
print '%s' % (tx.message)
-
-
you can get that library from $HIVE_HOME/lib/py/* copy that content inside the py folder and past into python library – Sreejith Jul 18 '13 at 12:07
-
@Sreejith I have no problem importing those python libraries, however, the code hangs after executing hive command. And it turned out to be a general problem. Were your code connecting to Hiveserver1 or Hiveserver2? https://groups.google.com/a/cloudera.org/forum/#!topic/cdh-user/lCSuh6vLmHM – B.Mr.W. Nov 06 '13 at 17:30
A much simpler solution if you're on Windows uses pyodbc
:
import pyodbc
import pandas as pd
# connect odbc to data source name
conn = pyodbc.connect("DSN=<your_dsn>", autocommit=True)
# read data into dataframe
hive_df = pd.read_sql("SELECT * FROM <table_name>", conn)
As long as you have an ODBC driver and a DSN, that's all you need.

- 6,038
- 1
- 32
- 35
-
3
-
@SamsonScharfrichter good point- I'll add a note specifying that my solution is for Windows. – Jared Wilber Dec 19 '18 at 16:59
-
2It's _doable_ on Linux, just not trivial (check that the installed version of `unixODBC` is recent enough, check that your `$LD_LIBRARY_PATH` taps its libraries, install the appropriate ODBC driver, reference that driver in `odbcinst.ini`, define a DSN in `odbc.ini`, troubleshoot) – Samson Scharfrichter Dec 19 '18 at 17:40