1

I submit my test.py to yarn as follows.

spark-submit --master yarn \
--deploy-mode cluster \
--executor-memory 8g \
--driver-memory 10g \
--num-executors 100 \
--executor-cores 10 \
--conf spark.yarn.dist.archives=/home/ml_env/ml_env.zip#pyenv \
--conf spark.pyspark.python=./pyenv/bin/python3 \
test.py

Here, I want to import some python packages, such numpy, configparser package, so I build a virtual environment named ml_env.

In ml_env environment, I first use virtualenv ml_env, and the source activate, then I pip install numpy and configparser on ml_env.

My test.py is shown as follow.

from pyspark import SparkContext
sc = SparkContext.getOrCreate()

import configparser
import numpy

if __name__ == "__main__":
    data = [1, 2, 3, 4, 5]
    distData = sc.parallelize(data)
    print("done",distData.collect())

But I meet an error:

Could not find platform independent libraries <prefix>
Could not find platform dependent libraries <exec_prefix>
Consider setting $PYTHONHOME to <prefix>[:<exec_prefix>]
Fatal Python error: initfsencoding: Unable to get the locale encoding
ModuleNotFoundError: No module named 'encodings'
rosefun
  • 1,797
  • 1
  • 21
  • 33
  • See if this helps https://stackoverflow.com/questions/38132755/importerror-no-module-named-encodings – HArdRe537 Jun 23 '20 at 04:29
  • I have read it, and try to rebuilt the virtual env, but it doesn't help. – rosefun Jun 23 '20 at 05:25
  • Which environment are you using? python version, macOS, Win? – RacoonOnMoon Jun 23 '20 at 06:11
  • I use python 3.7 installed by anaconda, and Linux. – rosefun Jun 23 '20 at 06:16
  • is python path set correctly? python -c "import sys; print(sys.path)" – RacoonOnMoon Jun 23 '20 at 06:53
  • The `sys.path` is `['', '/opt/anaconda3/lib/python37.zip', '/opt/anaconda3/lib/python3.7', '/opt/anaconda3/lib/python3.7/lib-dynload', '/opt/anaconda3/lib/python3.7/site-packages']`. – rosefun Jun 23 '20 at 06:56
  • You tried unsetting the python path? And i am not sure in your submit statement you are using "--conf spark.pyspark.python=./pyenv/bin/python3 \" but you want to use the conda one right? – RacoonOnMoon Jun 23 '20 at 07:18
  • Yes, I want to use the virtualenv as python environment. – rosefun Jun 23 '20 at 07:49
  • Mh maybe you will find something here: https://stackoverflow.com/questions/19292957/how-can-i-troubleshoot-python-could-not-find-platform-independent-libraries-pr I think i cant help u further. sry. But i think some path variables are messed up – RacoonOnMoon Jun 23 '20 at 09:05

0 Answers0