I am using spark
on python both iteratively launching the command pyspark
from Terminal and also launching an entire script with the command spark-submit pythonFile.py
I am using to analyze a local csv
file, so no distributed computation is performed.
I would like to use the library matplotlib
to plot columns of a dataframe. When importing matplotlib I get the error ImportError: No module named matplotlib
. Then I came across this question and tried the command sc.addPyFile()
but you could not find any file relating to matplotlib that I can pass to it on my OS (OSX).
For this reason I created a virtual environment and installed matplotlib with it. Navigating through the virtual environment I saw there was no file such as marplotlib.py
so I tried to pass it the entire folder sc.addPyFile("venv//lib/python3.7/site-packages/matplotlib")
but again no success.
I do not know which file I should include or how at this point and I ran out of ideas.
Is there a simple way to import matplotlib
library inside spark (installing with virtualenv or referencing the OS installation)? And if so, which *.py
files I should pass the command sc.addPyFile()
Again I am not interested in distributed computation: the python code will run only locally on my machine.