-1

I am trying to run spark-submit command from drive/folder where my python script and dataset is H:\spark_material. It just won't work !

But if I copy my python script into this folder C:\spark\bin then it works.

I believe it has something to do with environment variables. Here is my Path = %JAVA_HOME%\bin; %SPARK_HOME%\bin

Here are my variables: HADOOP_HOME = C:\winutils JAVA_HOME = C:\jdk SPARK_HOME = C:\spark

Java is properly installed as I have tried typing "java -version" anywhere in CMD and it works!!

Ben
  • 91
  • 1
  • 2
  • 9
  • 1
    _"It just won't work !"_ does **not** work here either. What's `H:\spark_material`?! How could we know what's inside and what's the problem? – Jacek Laskowski Oct 07 '17 at 16:04
  • Possible duplicate of [What is the reason for '...' is not recognized as an internal or external command, operable program or batch file?](https://stackoverflow.com/questions/41454769/what-is-the-reason-for-is-not-recognized-as-an-internal-or-external-comman) – Mofi Oct 07 '17 at 18:36
  • @Ben I looked on your `PATH` and I could see the mistake: There is space left of `C:\spark\bin`. This is the reason why nothing in that directory is found because the folder path is invalid because of this leading space character. I also strongly recommend to move `C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common` and `C:\ProgramData\Oracle\Java\javapath` after folder path for PowerShell. No application installer should register the application's folder path before the most important standard Windows paths, but many installers are bad coded. – Mofi Oct 07 '17 at 18:43
  • Can we have a look at the python code that you are using? Is there anything that is using relative path may be an issue. – Abhay Dandekar Oct 10 '17 at 01:50
  • @JacekLaskowski 'H:\spark_material' is the location where my python script is saved. – Ben Oct 10 '17 at 21:30
  • @Mofi There is no space in the path, and also tried moving paths you suggest. Still won't work. – Ben Oct 10 '17 at 21:31
  • @AbhayDandekar I am just using sample code from apache spark website to test out the spark-submit command .. so the code is: 'text_file = spark.textFile("file:///H:/spark_material/test.py") text_file.flatMap(lambda line: line.split()) .map(lambda word: (word, 1)) .reduceByKey(lambda a, b: a+b)' – Ben Oct 10 '17 at 21:35

2 Answers2

0

Open your cmd and type path and check is apache spark path specify till bin folder If not please fix your path

vaquar khan
  • 10,864
  • 5
  • 72
  • 96
  • It's there....C:\>path PATH=C:\Program Files (x86)\NVIDIA Corporation\PhysX\Common;C:\ProgramData\Oracle\Java\javapath;C:\Windows\system32;C:\Windows;C:\Windows\System32\Wbem;C:\Windows\System32\WindowsPowerShell\v1.0\;C:\jdk\bin; C:\spark\bin – Ben Oct 07 '17 at 14:42
  • No issue in your path , sorry not sure if any changes in window server 2012 – vaquar khan Oct 07 '17 at 16:29
0

It was/is mystery - I re-installed everything one by one on my machine except operating system and it was an issue with Python distribution in my opinion. When I reinstalled Canopy(enthought), spark-submit command started to work. I still don't know why it happened as even in my previous version of Canopy (Python) was working fine properly.

Thank you everyone for your response and contribution. Learnt a lot from you guys.

Ben
  • 91
  • 1
  • 2
  • 9