Apache Arrow connectivity issue with HDFS (Remote file-system)

Question

I want to connect pyarrow to read to and write parquet file in hdfs But I am facing some connectivity issue

I installed pyarrow and python pandas now I am trying to connect with hdfs in remote machine

Reference link - https://towardsdatascience.com/a-gentle-introduction-to-apache-arrow-with-apache-spark-and-pandas-bb19ffe0ddae

import pyarrow as pa
host = '172.17.0.2'
port = 8020
fs = pa.hdfs.connect(host, port)

Error messages

>>> fs = pa.hdfs.connect(host, port)
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/usr/lib64/python2.7/site-packages/pyarrow/hdfs.py", line 211, in connect
    extra_conf=extra_conf)
  File "/usr/lib64/python2.7/site-packages/pyarrow/hdfs.py", line 36, in __init__
    _maybe_set_hadoop_classpath()
  File "/usr/lib64/python2.7/site-packages/pyarrow/hdfs.py", line 136, in _maybe_set_hadoop_classpath
    classpath = _hadoop_classpath_glob('hadoop')
  File "/usr/lib64/python2.7/site-packages/pyarrow/hdfs.py", line 161, in _hadoop_classpath_glob
    return subprocess.check_output(hadoop_classpath_args)
  File "/usr/lib64/python2.7/subprocess.py", line 568, in check_output
    process = Popen(stdout=PIPE, *popenargs, **kwargs)
  File "/usr/lib64/python2.7/subprocess.py", line 711, in __init__
    errread, errwrite)
  File "/usr/lib64/python2.7/subprocess.py", line 1327, in _execute_child
    raise child_exception
OSError: [Errno 2] No such file or directory

It seems like the Hadoop client libraries are not installed on this machine. Are they installed and can you invoke `hadoop` or `hdfs` on the command line? — Wes McKinney, May 20 '19 at 13:26
Yes they installed and i also set HADOOP_HOME,ARROW_LIBHDFS_DIR and CLASSPATH — UDIT JOSHI, May 21 '19 at 02:13

Apache Arrow connectivity issue with HDFS (Remote file-system)

0 Answers0