Currently I'm working on a Python 3.6 project with some other people. We use a requirements.txt file to store our dependencies which will be installed with pip or conda.
I added pyspark >= 2.2.0
which will run pip install pyspark
.
We make use of anaconda. The installation has no errors and I can find the pyspark directory in my local Anaconda env site-packages
directory.
When I run my python script, which has some Spark code in it. I get the error: Failed to find Spark jars directory
. After some research I found out that I need to build the pyspark code because it isn't prebuilt when it comes with pip
.
I read the documentation but it isn't clear to me how to build the code. Why is there no build directory in my pyspark installation directory (needed to build it with build/mvn)? I prefer to use requirements.txt because I don't want all developers to download & install pyspark by there own.
EDIT - Main problem when running pyspark commands in shell is the following error:
Failed to find Spark jars directory.
You need to build Spark before running this program.