I am getting the following Warning when I run the PySpark job:
17/10/06 18:27:16 WARN ARPACK: Failed to load implementation from: com.github.fommil.netlib.NativeSystemARPACK
17/10/06 18:27:16 WARN ARPACK: Failed to load implementation from: com.github.fommil.netlib.NativeRefARPACK
My Code is
mat = RowMatrix(tf_rdd_vec.cache())
svd = mat.computeSVD(num_topics, computeU=False)
I am using Ubuntu 16.04 EC2 instance. And I have installed all following libraries into my system.
sudo apt install libarpack2 Arpack++ libatlas-base-dev liblapacke-dev libblas-dev gfortran libblas-dev liblapack-dev libnetlib-java libgfortran3 libatlas3-base libopenblas-base
I have adjusted LD_LIBRARY_PATH to point to shared lib path as following.
export LD_LIBRARY_PATH=/usr/lib/
Now when I list $LD_LIBRARY_PATH directory it shown me the following .so files
ubuntu:~$ ls $LD_LIBRARY_PATH/*.so | grep "pack\|blas"
/usr/lib/libarpack.so
/usr/lib/libblas.so
/usr/lib/libcblas.so
/usr/lib/libf77blas.so
/usr/lib/liblapack_atlas.so
/usr/lib/liblapacke.so
/usr/lib/liblapack.so
/usr/lib/libopenblasp-r0.2.18.so
/usr/lib/libopenblas.so
/usr/lib/libparpack.so
But Still I am not able to use the Native ARPACK implementation. Also I am Caching the RDD passing to matrix But it still throws Cache WARNING Any suggestion how to solve these 3 Warnings ?
I have downloaded compiled version of spark-2.2.0 from the spark download page.