I need to develop spark program remotely against spark cluster and run it without converting it to jar, simply by clicking "Run" button in IDE. However I got some confusing errors.
Here's the code:
import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf
object SimpleApp {
def main(args: Array[String]) {
val logFile = "D:\\Lab\\ScalaIDE\\data\\README.md" // file resides in local windows PC
val conf = new SparkConf().setAppName("Simple Application").setMaster("spark://172.31.110.234:7077")
val sc = new SparkContext(conf)
val logData = sc.textFile(logFile, 2).cache()
val numAs = logData.filter(line => line.contains("a")).count()
val numBs = logData.filter(line => line.contains("b")).count()
println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
}
}
172.31.110.234 is my spark standalone cluster (Linux). I run this code from my local PC (Windows, ScalaIDE installed, IP: 172.31.2.77).
Complain message:
16/10/07 17:47:00 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.
After researching, a workaround suggested to download winutils.exe in C:\Bin, then I tried to add this line of code above variable logFile:
System.setProperty("hadoop.home.dir", "C:\\");
Now I'm getting another error message as below:
16/10/07 17:56:28 INFO SparkContext: Running Spark version 2.0.1
16/10/07 17:56:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
...
...
16/10/07 17:56:34 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.31.110.234): java.lang.ClassNotFoundException: org.bigdata.linknet.SimpleApp$$anonfun$1
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
...
...
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, 172.31.110.234): java.lang.ClassNotFoundException: org.bigdata.linknet.SimpleApp$$anonfun$1
at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
...
...
Question: Is my scenario possible to run the spark code from my PC (by simply clicking "Run" button) to Spark Cluster ? I've read similar post Run Spark/Cloudera application in remote machine with Eclipse but it doesn't seem resolve my question.
Thanks, Yusata