Trying to use pyarrow to access hdfs file and not able to get it working, below is the code, thank you very much in advance.
[rxie@cedgedev03 code]$ python
Python 2.7.12 |Anaconda 4.2.0 (64-bit)| (default, Jul 2 2016, 17:42:40)
[GCC 4.4.7 20120313 (Red Hat 4.4.7-1)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
import pyarrow
import os
os.environ["JAVA_HOME"]="/usr/java/jdk1.8.0_121"
from pyarrow import hdfs
fs = hdfs.connect()
Traceback (most recent call last): File "", line 1, in File "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/pyarrow/hdfs.py", line 183, in connect extra_conf=extra_conf) File "/opt/cloudera/parcels/Anaconda/lib/python2.7/site-packages/pyarrow/hdfs.py", line 37, in init self._connect(host, port, user, kerb_ticket, driver, extra_conf) File "pyarrow/io-hdfs.pxi", line 89, in pyarrow.lib.HadoopFileSystem._connect File "pyarrow/error.pxi", line 83, in pyarrow.lib.check_status pyarrow.lib.ArrowIOError: Unable to load libhdfs