I am using Apache Spark on my windows machine. I am relatively new to this, and I am working locally before uploading my code to the cluster.
I've written a very simple scala program and everything works fine:
println("creating Dataframe from json")
val sqlContext = new org.apache.spark.sql.SQLContext(sc)
val rawData = sqlContext.read.json("test_data.txt")
println("this is the test data table")
rawData.show()
println("finished running")
The program executes correctly. I now want to add some processing which calls some simple Java functions that I've pre-packaged in a JAR file. I'm running the scala shell. As it says on the getting started page, I startup the shell with:
c:\Users\eshalev\Desktop\spark-1.4.1-bin-hadoop2.6\bin\spark-shell --master local[4] --jars myjar-1.0-SNAPSHOT.jar
Important fact: I don't have hadoop installed on my local machine. But as I'm only parsing a text file this shouldn't matter, and didn't matter until I used --jars.
I now proceed to run the same scala program. There are no references to the jar file yet... This time I get:
...some SPARK debug code here and then...
15/09/08 14:27:37 INFO Executor: Fetching http://10.61.97.179:62752/jars/myjar-1.0-SNAPSHOT.jar with timestamp 144
1715239626
15/09/08 14:27:37 INFO Utils: Fetching http://10.61.97.179:62752/jars/myjar-1.0-SNAPSHOT.jar-1.0 to C:\Users\eshalev\A
ppData\Local\Temp\spark-dd9eb37f-4033-4c37-bdbf-5df309b5eace\userFiles-ebe63c02-8161-4162-9dc0-74e3df6f7356\fetchFileTem
p2982091960655942774.tmp
15/09/08 14:27:37 INFO Executor: Fetching http://10.61.97.179:62752/jars/myjar-1.0-SNAPSHOT.jar with timestamp 144
1715239626
15/09/08 14:27:37 ERROR Executor: Exception in task 1.0 in stage 0.0 (TID 1)
java.lang.NullPointerException
at java.lang.ProcessBuilder.start(Unknown Source)
at org.apache.hadoop.util.Shell.runCommand(Shell.java:482)
at org.apache.hadoop.util.Shell.run(Shell.java:455)
at org.apache.hadoop.util.Shell$ShellCommandExecutor.execute(Shell.java:715)
at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:873)
at org.apache.hadoop.fs.FileUtil.chmod(FileUtil.java:853)
at org.apache.spark.util.Utils$.fetchFile(Utils.scala:465)
... aplenty more spark debug messages here, and then ...
this is the test data table
<console>:20: error: not found: value rawData
rawData.show()
^
finished running
I double checked http://10.61.97.179:62752/jars/myjar-1.0-SNAPSHOT.jar-1.0-SNAPSHOT.jar, and I can download it just fine. And then again, nothing in the code references the jar yet. If start the shell without --jar everything works fine.