7

I'm trying to setup pyspark on my desktop and interact with it via the terminal. I'm following this guide,

http://jmedium.com/pyspark-in-python/

When I run 'pyspark' in the terminal is says,

/home/jacob/spark-2.1.0-bin-hadoop2.7/bin/pyspark: line 45: python:
command not found
env: ‘python’: No such file or directory

I've followed several guides which all lead to this same issue (some have different details on setting up the .profile. Thus far none have worked correctly). I have java, python3.6, and Scala installed. My .profile is configured as follows:

#Spark and PySpark Setup
PATH="$HOME/bin:$HOME/.local/bin:$PATH"
export SPARK_HOME='/home/jacob/spark-2.1.0-bin-hadoop2.7'
export PATH=$SPARK_HOME:$PATH
export PYTHONPATH=$SPARK_HOME/python:$PYTHONPATH
#export PYSPARK_DRIVER_PYTHON="jupyter"
#export PYSPARK_DRIVER_PYTHON_OPTS="notebook"
export PYSPARK_PYTHON=python3.6.5

Note that jupyter notebook is commented out because I want to launch pyspark in the shell right now with out the notebook starting

Interestingly spark-shell launches just fine

I'm using Ubuntu 18.04.1 and Spark 2.1

See Images

I've tried every guide I can find, and since this is my first time setting up Spark i'm not sure how to troubleshoot it from here

Thank you

Attempting to execute pyspark

.profile

versions

zero323
  • 322,348
  • 103
  • 959
  • 935
Cheddar
  • 131
  • 1
  • 1
  • 8
  • 1
    Possible duplicate of [/usr/bin/env: python2: No such file or directory](https://stackoverflow.com/questions/11390206/usr-bin-env-python2-no-such-file-or-directory) –  Sep 06 '18 at 06:59
  • 1
    I read that thread and while similar I don't think it solves my issue. It details setting up the python path to solve the problem, I've already done this to no success. – Cheddar Sep 06 '18 at 12:15

5 Answers5

17

You should have set export PYSPARK_PYTHON=python3 instead of export PYSPARK_PYTHON=python3.6.5 in your .profile

then source .profile , of course.

That's worked for me.

other options, installing sudo apt python (which is for 2.x ) is not appropriate.

Tansu Dasli
  • 170
  • 1
  • 6
6

For those who may come across this, I figured it out!

I specifically chose to use an older version of Spark in order to follow along with a tutorial I was watching - Spark 2.1.0. I did not know that the latest version of Python (3.5.6 at the time of writing this) is incompatible with Spark 2.1. Thus PySpark would not launch.

I solved this by using Python 2.7 and setting the path accordingly in .bashrc

export PYTHONPATH=$PYTHONPAH:/usr/lib/python2.7
export PYSPARK_PYTHON=python2.7
Cheddar
  • 131
  • 1
  • 1
  • 8
4

People using python 3.8 and Spark <= 2.4.5 will have the same problem.

In this case, the only solution I found is to update spark to V 3.0.0.

Look at https://bugs.python.org/issue38775

1

for GNU/Linux users that have python3 package installed (ubuntu/debian distro's specially) you can find a package called "python-is-python3" this would help identifying python3 as python command.

# apt install python-is-python3

python 2.7 is deprecated now (2020 ubuntu 20.10) so do not try installing it.

-5

I have already solved this issue. Just type this command:

sudo apt install python
Andronicus
  • 25,419
  • 17
  • 47
  • 88