0

i've configured everything accordingly to the spark website, to launch a simple spark app which reads and counts lines in a file and shows the numbers in another file.But i'm not able to run the application because i'm getting a lot of errors and i don't understand what's is wrong.

this is my project structure:

sparkExamples
|-- pom.xml
`-- src
    |-- main/java/org/sparkExamplex/App.java
`-- resources
          |-- readLine
          |-- outputReadLine

pom.xml

<project xmlns="http://maven.apache.org/POM/4.0.0" xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
    xsi:schemaLocation="http://maven.apache.org/POM/4.0.0 http://maven.apache.org/xsd/maven-4.0.0.xsd">
    <modelVersion>4.0.0</modelVersion>

    <groupId>org.sparkexamples</groupId>
    <artifactId>sparkExamples</artifactId>
    <version>0.0.1-SNAPSHOT</version>

    <name>sparkExamples</name>
    <url>http://maven.apache.org</url>


    <dependencies>
        <dependency> <!-- Spark dependency -->
            <groupId>org.apache.spark</groupId>
            <artifactId>spark-core_2.10</artifactId>
            <version>1.3.1</version>
        </dependency>
    </dependencies>
</project>

App.java

import org.apache.spark.SparkConf;
import org.apache.spark.api.java.JavaRDD;
import org.apache.spark.api.java.JavaSparkContext;
import org.apache.spark.api.java.JavaPairRDD;
import org.apache.spark.api.java.function.FlatMapFunction;
import org.apache.spark.api.java.function.Function2;
import org.apache.spark.api.java.function.PairFunction;

import scala.Tuple2;

import java.util.Arrays;
import java.util.regex.Pattern;

public final class App {
    private static final Pattern SPACE = Pattern.compile(" ");

    public static void main(String[] args) throws Exception {

        String inputFile = "resources/readLine";
        String outputFile = "resources/outputReadLine";
        // Create a Java Spark Context.
        SparkConf conf = new SparkConf().setAppName("wordCount").setMaster("spark://127.0.0.1:7077");
        JavaSparkContext sc = new JavaSparkContext(conf);
        // Load our input data.
        JavaRDD<String> input = sc.textFile(inputFile);
        // Split up into words.
        JavaRDD<String> words = input.flatMap(new FlatMapFunction<String, String>() {
            public Iterable<String> call(String x) {
                return Arrays.asList(x.split(" "));
            }
        });
        // Transform into word and count.
        JavaPairRDD<String, Integer> counts = words.mapToPair(new PairFunction<String, String, Integer>() {
            public Tuple2<String, Integer> call(String x) {
                return new Tuple2(x, 1);
            }
        }).reduceByKey(new Function2<Integer, Integer, Integer>() {
            public Integer call(Integer x, Integer y) {
                return x + y;
            }
        });
        // Save the word count back out to a text file, causing evaluation.
        counts.saveAsTextFile(outputFile);
    }
}


    }

errors shown:

Using Spark's default log4j profile: org/apache/spark/log4j-defaults.properties
15/06/03 13:11:46 INFO SparkContext: Running Spark version 1.3.1
15/06/03 13:11:47 INFO SecurityManager: Changing view acls to: Administrator
15/06/03 13:11:47 INFO SecurityManager: Changing modify acls to: Administrator
15/06/03 13:11:47 INFO SecurityManager: SecurityManager: authentication disabled; ui acls disabled; users with view permissions: Set(Administrator); users with modify permissions: Set(Administrator)
15/06/03 13:11:47 INFO Slf4jLogger: Slf4jLogger started
15/06/03 13:11:47 INFO Remoting: Starting remoting
15/06/03 13:11:47 INFO Remoting: Remoting started; listening on addresses :[akka.tcp://sparkDriver@mvettos.romelab.it.ibm.com:9164]
15/06/03 13:11:47 INFO Utils: Successfully started service 'sparkDriver' on port 9164.
15/06/03 13:11:47 INFO SparkEnv: Registering MapOutputTracker
15/06/03 13:11:47 INFO SparkEnv: Registering BlockManagerMaster
15/06/03 13:11:47 INFO DiskBlockManager: Created local directory at C:\Users\ADMINI~1\AppData\Local\Temp\spark-e6bb5cfc-6b96-4105-9a1c-843832ba60f9\blockmgr-dea7bb85-954c-4a4d-b3fb-74d7b6b1d9f5
15/06/03 13:11:47 INFO MemoryStore: MemoryStore started with capacity 467.6 MB
15/06/03 13:11:47 INFO HttpFileServer: HTTP File server directory is C:\Users\ADMINI~1\AppData\Local\Temp\spark-6e11c6bc-2743-4172-8d74-f3abc08d9f46\httpd-2bfa61a2-a1fd-4bd3-85c2-bcbc05d2ec27
15/06/03 13:11:47 INFO HttpServer: Starting HTTP Server
15/06/03 13:11:48 INFO Server: jetty-8.y.z-SNAPSHOT
15/06/03 13:11:48 INFO AbstractConnector: Started SocketConnector@0.0.0.0:9165
15/06/03 13:11:48 INFO Utils: Successfully started service 'HTTP file server' on port 9165.
15/06/03 13:11:48 INFO SparkEnv: Registering OutputCommitCoordinator
15/06/03 13:11:48 INFO Server: jetty-8.y.z-SNAPSHOT
15/06/03 13:11:48 INFO AbstractConnector: Started SelectChannelConnector@0.0.0.0:4040
15/06/03 13:11:48 INFO Utils: Successfully started service 'SparkUI' on port 4040.
15/06/03 13:11:48 INFO SparkUI: Started SparkUI at http://mvettos.romelab.it.ibm.com:4040
15/06/03 13:11:48 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@127.0.0.1:7077/user/Master...
15/06/03 13:11:49 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkMaster@127.0.0.1:7077: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster@127.0.0.1:7077
15/06/03 13:11:49 WARN Remoting: Tried to associate with unreachable remote 
address [akka.tcp://sparkMaster@127.0.0.1:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: no further information: /127.0.0.1:7077
15/06/03 13:12:08 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@127.0.0.1:7077/user/Master...
15/06/03 13:12:09 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkMaster@127.0.0.1:7077: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster@127.0.0.1:7077
15/06/03 13:12:09 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster@127.0.0.1:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: no further information: /127.0.0.1:7077
15/06/03 13:12:28 INFO AppClient$ClientActor: Connecting to master akka.tcp://sparkMaster@127.0.0.1:7077/user/Master...
15/06/03 13:12:29 WARN AppClient$ClientActor: Could not connect to akka.tcp://sparkMaster@127.0.0.1:7077: akka.remote.InvalidAssociation: Invalid address: akka.tcp://sparkMaster@127.0.0.1:7077
15/06/03 13:12:29 WARN Remoting: Tried to associate with unreachable remote address [akka.tcp://sparkMaster@127.0.0.1:7077]. Address is now gated for 5000 ms, all messages to this address will be delivered to dead letters. Reason: Connection refused: no further information: /127.0.0.1:7077
15/06/03 13:12:48 ERROR SparkDeploySchedulerBackend: Application has been killed. Reason: All masters are unresponsive! Giving up.
15/06/03 13:12:48 WARN SparkDeploySchedulerBackend: Application ID is not initialized yet.
15/06/03 13:12:48 ERROR TaskSchedulerImpl: Exiting due to error from cluster scheduler: All masters are unresponsive! Giving up.

Someone can let me how to figure how on this?

thanks in advance.

OiRc
  • 1,602
  • 4
  • 21
  • 60
  • It seems your program could connect to sparks properly. Check the direction and the service of spark over 7077. I think Spark service is not properly running. – JTejedor Jun 03 '15 at 11:21
  • @JTejedor can you be more specific? – OiRc Jun 03 '15 at 11:22
  • Assuming you are using Spark Standalon Mode. In the manual [link](https://spark.apache.org/docs/latest/spark-standalone.html). You can ping over "spark://HOST:PORT" to check if the service is working. If not, the problem is because the service of spark is down. – JTejedor Jun 03 '15 at 11:27
  • i didnt downloaed the spark package, i only created a maven project and i linked the dependecies, should i have to download [this](https://spark.apache.org/downloads.html) ? p.s i want spark to run standalone, without any hadoop instantiation. – OiRc Jun 03 '15 at 11:33
  • 1
    So you need install the package of spark standalone (yes, that link), because your progran is trying right now to connect to a nonexistence spark service – JTejedor Jun 03 '15 at 11:37
  • @JTejedor ok, but how it's shown in the website, i don't understand how to install spark, can you guide me? – OiRc Jun 03 '15 at 11:41
  • Try with this [tutorial](http://mbonaci.github.io/mbo-spark/). I hope you could install it without any inconvenience – JTejedor Jun 03 '15 at 11:45
  • @JTejedor well, i'm using a windows machine, is the process the same? – OiRc Jun 03 '15 at 11:46
  • Try with this [tutorial](http://ml-nlp-ir.blogspot.com.es/2014/04/building-spark-on-windows-and-cloudera.html?m=1). Extracted from [the support forum of cloudera](http://apache-spark-user-list.1001560.n3.nabble.com/Does-anyone-have-a-stand-alone-spark-instance-running-on-Windows-td12238.html). Found with the words "how to install spark cluster standalone windows" on google – JTejedor Jun 03 '15 at 11:58
  • you don't need to launch a spark master. you just need to use the spark-submit command with --master local['4'] (4 is the number of cores) – eliasah Jun 03 '15 at 12:02
  • @JTejedor this tutorial is oriented on a scala enviroment. I need a tutorial for windows and java/eclipse... i searched a lot.. but i haven't found a guide yet. – OiRc Jun 03 '15 at 13:44

1 Answers1

1

After several comments, we can summarize the answer as follows:

  • The component is well configured and you follow properly the cloudera tutorial steps.
  • The app does not connect with Spark because any Spark service is currently installed in your system.
  • So, You need now a tutorial to install Spark as standalone in Windows environment. Searching in stackoverflow, I found a question equal than this one.

In short this question is a duplicate. I hope this answer could help you.

Community
  • 1
  • 1
JTejedor
  • 1,082
  • 10
  • 24