I'm having a lot of problems with Spark on Windows. So explaining the error:
There are a lot of tutorial to install and solve many issues, however I've been trying for hours and still can't make it work.
I have Java 8, which I have at System Path
C:\>java -version
java version "1.8.0_131"
Java(TM) SE Runtime Environment (build 1.8.0_131-b11)
Java HotSpot(TM) 64-Bit Server VM (build 25.131-b11, mixed mode)
I also have Python 2.7 with Anaconda 4.4
C:\Program Files (x86)\Spark\python\dist>python -V
Python 2.7.13 :: Anaconda 4.4.0 (64-bit)
Just in case, I do have Scale, SBT and GOW.
C:\>scala -version
Scala code runner version 2.11.8 -- Copyright 2002-2016, LAMP/EPFL
C:\>gow -version
Gow 0.8.0 - The lightweight alternative to Cygwin
C:\>sbt
Java HotSpot(TM) 64-Bit Server VM warning: ignoring option MaxPermSize=256m; support was removed in 8.0
> about
[info] This is sbt 0.13.15
So getting to the installation:
Firstly I downloaded Spark 2.1.1 with package type Pre-build for Apache Hadoop 2.7 and later
I extract it on a certain Folder, say
C:\Programs\Spark
On the Python Folder, I ran
python setup.py sdist
, that should make suitable tgz file forpip
, which it did.Going into
dist
, I ranpip install NAME_OF_PACKAGE.tgz
. That did install it, because ifconda list
:C:\>conda list # packages in environment at C:\Program Files (x86)\Anaconda2: # ... pyspark 2.1.1+hadoop2.7 <pip> ...
I did have some doubts, so I went to Anaconda's
Scripts
andsite-packages
. Both had what I expected, In Scripts there arepyspark
spark-shell
and so on. Thepyspark
folder atsite-packages
also has everything from jars folder to its own bin folder that also has the scripts above.About the hadoop, I did download winutils.exe and paste it on the
Spark's bin folder
which make it also be at thepython's pyspark's bin folder
.With that in mind, I did import pyspark without a problem:
C:\Users\Rolando Casanueva>python Python 2.7.13 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:17:26) [MSC v.1500 64 bit (AMD64)] on win32 Type "help", "copyright", "credits" or "license" for more information. Anaconda is brought to you by Continuum Analytics. Please check out: http://continuum.io/thanks and https://anaconda.org >>> import pyspark >>>
FIRST QUESTION: Do I have to paste winutils.exe also at python's Scripts Folder?
Going to main situation, the problem occurs when using pyspark
and it raise this exception.
C:\Users\Rolando Casanueva>python
Python 2.7.13 |Anaconda 4.4.0 (64-bit)| (default, May 11 2017, 13:17:26) [MSC v.1500 64 bit (AMD64)] on win32
Type "help", "copyright", "credits" or "license" for more information.
Anaconda is brought to you by Continuum Analytics.
Please check out: http://continuum.io/thanks and https://anaconda.org
>>> import pyspark
>>> pyspark.SparkContext()
C:\Program Files (x86)\Anaconda2\lib\site-packages\pyspark
"Files" no se reconoce como un comando interno o externo,
programa o archivo por lotes ejecutable.
Failed to find Spark jars directory.
You need to build Spark before running this program.
Traceback (most recent call last):
File "<stdin>", line 1, in <module>
File "C:\Program Files (x86)\Anaconda2\lib\site-packages\pyspark\context.py", line 115, in __init__
SparkContext._ensure_initialized(self, gateway=gateway, conf=conf)
File "C:\Program Files (x86)\Anaconda2\lib\site-packages\pyspark\context.py", line 259, in _ensure_initialized
SparkContext._gateway = gateway or launch_gateway(conf)
File "C:\Program Files (x86)\Anaconda2\lib\site-packages\pyspark\java_gateway.py", line 96, in launch_gateway
raise Exception("Java gateway process exited before sending the driver its port number")
Exception: Java gateway process exited before sending the driver its port number
>>>
- I did install spark in local mode StackOverFlow Answer to: How to set up Spark on Windows?
- I did install spark like this Youtube Tutorial
https://www.youtube.com/watch?v=omlwDosMGVk
- I did install spark as a jupyter complement
https://mas-dse.github.io/DSE230/installation/windows/
- And finally I tried as above.
Same error shows on every installation.
SECOND QUESTION: How to solve this issue?
EXTRA QUESTION: Any other recommendation to install it?