I submit my test.py
to yarn as follows.
spark-submit --master yarn \
--deploy-mode cluster \
--executor-memory 8g \
--driver-memory 10g \
--num-executors 100 \
--executor-cores 10 \
--conf spark.yarn.dist.archives=/home/ml_env/ml_env.zip#pyenv \
--conf spark.pyspark.python=./pyenv/bin/python3 \
test.py
Here, I want to import some python packages, such numpy, configparser package, so I build a virtual environment named ml_env
.
In ml_env
environment, I first use virtualenv ml_env
, and the source activate
, then I pip install numpy
and configparser
on ml_env
.
My test.py
is shown as follow.
from pyspark import SparkContext
sc = SparkContext.getOrCreate()
import configparser
import numpy
if __name__ == "__main__":
data = [1, 2, 3, 4, 5]
distData = sc.parallelize(data)
print("done",distData.collect())
But I meet an error:
Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Fatal Python error: initfsencoding: Unable to get the locale encoding
ModuleNotFoundError: No module named 'encodings'