0

I need to develop spark program remotely against spark cluster and run it without converting it to jar, simply by clicking "Run" button in IDE. However I got some confusing errors.

Here's the code:

import org.apache.spark.SparkContext
import org.apache.spark.SparkContext._
import org.apache.spark.SparkConf

object SimpleApp {
  def main(args: Array[String]) {
    val logFile = "D:\\Lab\\ScalaIDE\\data\\README.md" // file resides in local windows PC
    val conf = new SparkConf().setAppName("Simple Application").setMaster("spark://172.31.110.234:7077")
    val sc = new SparkContext(conf)
    val logData = sc.textFile(logFile, 2).cache()
    val numAs = logData.filter(line => line.contains("a")).count()
    val numBs = logData.filter(line => line.contains("b")).count()
    println("Lines with a: %s, Lines with b: %s".format(numAs, numBs))
  }
}

172.31.110.234 is my spark standalone cluster (Linux). I run this code from my local PC (Windows, ScalaIDE installed, IP: 172.31.2.77).

Complain message:

16/10/07 17:47:00 ERROR Shell: Failed to locate the winutils binary in the hadoop binary path
java.io.IOException: Could not locate executable null\bin\winutils.exe in the Hadoop binaries.

After researching, a workaround suggested to download winutils.exe in C:\Bin, then I tried to add this line of code above variable logFile:

System.setProperty("hadoop.home.dir", "C:\\");

Now I'm getting another error message as below:

16/10/07 17:56:28 INFO SparkContext: Running Spark version 2.0.1
16/10/07 17:56:28 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
...
...
16/10/07 17:56:34 WARN TaskSetManager: Lost task 0.0 in stage 0.0 (TID 0, 172.31.110.234): java.lang.ClassNotFoundException: org.bigdata.linknet.SimpleApp$$anonfun$1
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
...
...
Exception in thread "main" org.apache.spark.SparkException: Job aborted due to stage failure: Task 0 in stage 0.0 failed 4 times, most recent failure: Lost task 0.3 in stage 0.0 (TID 6, 172.31.110.234): java.lang.ClassNotFoundException: org.bigdata.linknet.SimpleApp$$anonfun$1
    at java.net.URLClassLoader.findClass(URLClassLoader.java:381)
...
...

Question: Is my scenario possible to run the spark code from my PC (by simply clicking "Run" button) to Spark Cluster ? I've read similar post Run Spark/Cloudera application in remote machine with Eclipse but it doesn't seem resolve my question.

Thanks, Yusata

Community
  • 1
  • 1
Yusata
  • 199
  • 1
  • 3
  • 16

1 Answers1

0

Failed to locate the winutils binary error is not an issue, you could usually ignore it.

The above exception is thrown because your Spark cluster doesn't have your classes.

To achieve what you want, you would need:

  1. Build the jar (if you use gradle -> fatJar or shadowJar)
  2. In your code, when you generate the SparkConf, you need to specify Master address and relative Jar location, smth like:
SparkConf conf = new SparkConf()
.setMaster("spark://SPARK-MASTER-ADDRESS:7077")
.setJars(new String[]{"build\\libs\\spark-test-1.0-SNAPSHOT.jar"})
.setAppName("APP-NAME");
Michael
  • 346
  • 2
  • 7