I'm trying to use the pyarrow Filesystem interface with HDFS. I receive a libhdfs.so not found error when calling the fs.HadoopFileSystem constructor even though libhdfs.so is apparently at the indicated location.
from pyarrow import fs
hfs = fs.HadoopFileSystem(host="10.10.0.167", port=9870)
OSError: Unable to load libhdfs: /hadoop-3.3.1/lib/native/libhdfs.so: cannot open shared object file: No such file or directory
I've tried different python and pyarrow versions and setting ARROW_LIBHDFS_DIR. For testing, I'm using the following dockerfile on linuxmint.
FROM openjdk:11
RUN apt-get update &&\
apt-get install wget -y
RUN wget -nv https://dlcdn.apache.org/hadoop/common/hadoop-3.3.1/hadoop-3.3.1-aarch64.tar.gz &&\
tar -xf hadoop-3.3.1-aarch64.tar.gz
ENV PATH=/miniconda/bin:${PATH}
RUN wget -nv https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh -O miniconda.sh &&\
bash miniconda.sh -b -p /miniconda &&\
conda init
RUN conda install -c conda-forge python=3.9.6
RUN conda install -c conda-forge pyarrow=4.0.1
ENV JAVA_HOME=/usr/local/openjdk-11
ENV HADOOP_HOME=/hadoop-3.3.1
RUN printf 'from pyarrow import fs\nhfs = fs.HadoopFileSystem(host="10.10.0.167", port=9870)\n' > test_arrow.py
# 'python test_arrow.py' fails with ...
# OSError: Unable to load libhdfs: /hadoop-3.3.1/lib/native/libhdfs.so: cannot open shared object file: No such file or directory
RUN python test_arrow.py || true
CMD ["/bin/bash"]