I've been trying pyarrow installation via pip (pip install pyarrow
, and, as suggested Yagav: py -3.7 -m pip install --user pyarrow
) and conda (conda install -c conda-forge pyarrow
, also used conda install pyarrow
) , building lib from src (using conda environment and some magic, which I don’t really understand), but all the time, after installation (with no errors) it ends with one and the same problem, when I call:
import pyarrow as pa
fs = pa.hdfs.connect(host='my_host', user='my_user@my_host', kerb_ticket='path_to_kerb_ticket')
it fails with next message:
Traceback (most recent call last): File "", line 1, in File "C:\ProgramData\Anaconda3\lib\site-packages\pyarrow\hdfs.py", line 209, in connect extra_conf=extra_conf) File "C:\ProgramData\Anaconda3\lib\site-packages\pyarrow\hdfs.py", line 37, in __init__ _maybe_set_hadoop_classpath() File "C:\ProgramData\Anaconda3\lib\site-packages\pyarrow\hdfs.py", line 135, in _maybe_set_hadoop_classpath classpath = _hadoop_classpath_glob(hadoop_bin) File "C:\ProgramData\Anaconda3\lib\site-packages\pyarrow\hdfs.py", line 162, in _hadoop_classpath_glob return subprocess.check_output(hadoop_classpath_args) File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 395, in check_output **kwargs).stdout File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 472, in run with Popen(*popenargs, **kwargs) as process: File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 775, in __init__ restore_signals, start_new_session) File "C:\ProgramData\Anaconda3\lib\subprocess.py", line 1178, in _execute_child startupinfo) OSError: [WinError 193] %1 is not a valid win32 application
At first I was thinking, that there is a problem with libhdfs.so from Hadoop 2.5.6, but it seems that I was wrong about that. I guess, there is a problem not in the pyarrow or subprocess, but some system variables or dependencies.
Also I have manually defined system variables as HADOOP_HOME
, JAVA_HOME
and KRB5CCNAME