Tried --jars option and --driver-class-jars etc. It still gave me 'no module fuzzywuzzy' found error.
Asked
Active
Viewed 1,236 times
1 Answers
2
Try pyspark --packages me.xdrop:fuzzywuzzy:1.1.8
Also have a look at https://stackoverflow.com/a/44153456/3811916 for some other options, depending on your desired workflow/environment.

eddies
- 7,113
- 3
- 36
- 39
-
Thanks! I was showing the fuzzywuzzy successfully retrieved. But when I typed in 'import fuzzywuzzy ' I still got 'import error: No module names fuzzywuzzy ' – user3610141 Aug 16 '17 at 15:08
-
I'm thinking it's because pyspark is not picking up the right python library path. The jar was put into my home dir/.ivy2/jars. I exported PYTHONPATH with this folder, it still didn't help. – user3610141 Aug 16 '17 at 16:27
-
If you're trying to use the original Python implementation of fuzzywuzzy (https://github.com/seatgeek/fuzzywuzzy), you should just be installing that (`pip install fuzzywuzzy`). Your question specifically asked about installing the fuzzywuzzy *jar* (i.e. https://github.com/xdrop/fuzzywuzzy) --but installing that isn't going to magically make it available as a Python package... – eddies Aug 17 '17 at 03:14
-
Thanks, you are right. However I don't have root permission to install it to the original python path on edge mode. Also I can't install them on all the data nodes. How to ship the library when I submit the pyspark job to the cluster ? Do you have an example of the spark-submit command with complete configurations? – user3610141 Aug 17 '17 at 03:59
-
You could try using `spark-submit` with `--py-files`. You're now asking a completely different question about making Python modules available across workers but you haven't provided enough information about your environment for a good answer. Try looking at: https://stackoverflow.com/questions/36461054/i-cant-seem-to-get-py-files-on-spark-to-work – eddies Aug 17 '17 at 05:10
-
Thank you very much! Got it working now with the additional information you provided. – user3610141 Aug 17 '17 at 22:36