2

I am playing with apache-spark on aws emr, and trying to use this to set the cluster to use python3,

I use the command as the last command in a bootstrap script

sudo sed -i -e '$a\export PYSPARK_PYTHON=/usr/bin/python3' /etc/spark/conf/spark-env.sh

When I use it the cluster crashes during the bootstrap with the following error.

sed: can't read /etc/spark/conf/spark-env.sh: No such file or directory

How should I set it to use python3 properly?

This is not a duplicate of, My issue is that the cluster is not finding the spark-env.sh file while bootstrapping, while the other question addresses the issue of the system not finding python3

thebeancounter
  • 4,261
  • 8
  • 61
  • 109

1 Answers1

3

In the end I did not use that script, but Used the EMR configuration file that is available on the creation stage, It gave me the proper configurations via spark_submit (in the aws gui) If you need it to be available for pyspark scripts in a more programatic way, you can use os.environ to set the pyspark python version in the python script

thebeancounter
  • 4,261
  • 8
  • 61
  • 109