I need to run a PySpark application (v1.6.3). There is the --py-files
flag to add .zip, .egg, or .py files. If I had a Python package/module at /usr/anaconda2/lib/python2.7/site-packages/fuzzywuzzy
, how would I include this whole module?
Inside this directory, I do notice some *.py and *.pyc files.
- fuzz.py
- process.py
- StringMatcher.py
- string_processing.py
- utils.py
Would I have to include each of these one-by-one? For example.
spark-submit \
--py-files /usr/anaconda2/lib/python2.7/site-packages/fuzzywuzzy/fuzz.py,/usr/anaconda2/lib/python2.7/site-packages/fuzzywuzzy/process.py,/usr/anaconda2/lib/python2.7/site-packages/fuzzywuzzy/StringMatcher.py,/usr/anaconda2/lib/python2.7/site-packages/fuzzywuzzy/string_processing.py,/usr/anaconda2/lib/python2.7/site-packages/fuzzywuzzy/utils.py
Is there an easier way?
- should I try to find the .egg or .zip and use it (e.g. pypi)?
- can I just zip up this directory and pass that in?
Any tips or pointers would be greatly appreciated. In reality, there are more Python modules managed by conda that I need.