I'm trying to import the KMeans and Vectors classes from spark.mllib. The platform is IBM Cloud (DSX) with python 3.5 and a Junyper Notebook.
I've tried:
import org.apache.spark.mllib.linalg.Vectors
import apache.spark.mllib.linalg.Vectors
import spark.mllib.linalg.Vectors
I've found several examples/tutorials with the first import
working for the author. I've was able to confirm that the spark library itself isn't loaded in the environment. Normally, I would download the package and then import
. But being new to VMs, I'm not sure how to make this happen.
I've also tried pip install spark
without luck. It throws an error that reads:
The following command must be run outside of the IPython shell:
$ pip install spark
The Python package manager (pip) can only be used from outside of IPython.
Please reissue the `pip` command in a separate terminal or command prompt.
But this is in a VM where I don't see the ability to externally access the CLI.
I did find this, but I don't think I have a mismatch problem -- the issue on importing into DSX is covered but I can't quite interpret it for my situation.
I think this is the actual issue I'm having but it is for sparkR and not python.