I am trying to use pyarrow on Windows but I'm getting the following error with fs.HadoopFileSystem()
:
OSError Traceback (most recent call last)
Cell In[1], line 2
1 from pyarrow import fs
----> 2 hdfs = fs.HadoopFileSystem(host='localhost', port=9870)
File c:\prj\study\.venv\lib\site-packages\pyarrow\_hdfs.pyx:96, in pyarrow._hdfs.HadoopFileSystem.__init__()
File c:\prj\study\.venv\lib\site-packages\pyarrow\error.pxi:144, in pyarrow.lib.pyarrow_internal_check_status()
File c:\prj\study\.venv\lib\site-packages\pyarrow\error.pxi:115, in pyarrow.lib.check_status()
OSError: Unable to load libhdfs: 指定されたモジュールが見つかりません。
I followed the steps on this site to install Hadoop using binaries from Apache and I am able to use it through cmd. However when I checked lbhdfs.so
in lib/native
, it shows as a 0 kb file. Is this normal, or do I have to compile Hadoop source on my own so I could get the correct libhdfs.so
?