1

I am trying to spark-submit using Amazon ec2 with the following:

spark-submit --packages org.apache.hadoop:hadoop-aws:2.7.1 --master spark://amazonaws.com SimpleApp.py

and I end up with the following error. It seems to be that it is looking for hadoop. My ec2 cluster was created using spark-ec2 command.

Ivy Default Cache set to: /home/adas/.ivy2/cache
The jars for the packages stored in: /home/adas/.ivy2/jars
:: loading settings :: url = jar:file:/home/adas/spark/spark-2.1.0-bin-hadoop2.7/jars/ivy-2.4.0.jar!/org/apache/ivy/core/settings/ivysettings.xml
org.apache.hadoop#hadoop-aws added as a dependency
:: resolving dependencies :: org.apache.spark#spark-submit-parent;1.0
    confs: [default]
:: resolution report :: resolve 66439ms :: artifacts dl 0ms
    :: modules in use:
    ---------------------------------------------------------------------
    |                  |            modules            ||   artifacts   |
    |       conf       | number| search|dwnlded|evicted|| number|dwnlded|
    ---------------------------------------------------------------------
    |      default     |   1   |   0   |   0   |   0   ||   0   |   0   |
    ---------------------------------------------------------------------

:: problems summary ::
:::: WARNINGS
        module not found: org.apache.hadoop#hadoop-aws;2.7.1

    ==== local-m2-cache: tried

      file:/home/adas/.m2/repository/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.pom

      -- artifact org.apache.hadoop#hadoop-aws;2.7.1!hadoop-aws.jar:

      file:/home/adas/.m2/repository/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.jar

    ==== local-ivy-cache: tried

      /home/adas/.ivy2/local/org.apache.hadoop/hadoop-aws/2.7.1/ivys/ivy.xml

      -- artifact org.apache.hadoop#hadoop-aws;2.7.1!hadoop-aws.jar:

      /home/adas/.ivy2/local/org.apache.hadoop/hadoop-aws/2.7.1/jars/hadoop-aws.jar

    ==== central: tried

      https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.pom

      -- artifact org.apache.hadoop#hadoop-aws;2.7.1!hadoop-aws.jar:

      https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.jar

    ==== spark-packages: tried

      http://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.pom

      -- artifact org.apache.hadoop#hadoop-aws;2.7.1!hadoop-aws.jar:

      http://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.jar

        ::::::::::::::::::::::::::::::::::::::::::::::

        ::          UNRESOLVED DEPENDENCIES         ::

        ::::::::::::::::::::::::::::::::::::::::::::::

        :: org.apache.hadoop#hadoop-aws;2.7.1: not found

        ::::::::::::::::::::::::::::::::::::::::::::::


:::: ERRORS
    Server access error at url https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.pom (java.net.NoRouteToHostException: No route to host (Host unreachable))

    Server access error at url https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.jar (java.net.NoRouteToHostException: No route to host (Host unreachable))

    Server access error at url http://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.pom (java.net.NoRouteToHostException: No route to host (Host unreachable))

    Server access error at url http://dl.bintray.com/spark-packages/maven/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.jar (java.net.NoRouteToHostException: No route to host (Host unreachable))


:: USE VERBOSE OR DEBUG MESSAGE LEVEL FOR MORE DETAILS
Exception in thread "main" java.lang.RuntimeException: [unresolved dependency: org.apache.hadoop#hadoop-aws;2.7.1: not found]
    at org.apache.spark.deploy.SparkSubmitUtils$.resolveMavenCoordinates(SparkSubmit.scala:1078)
    at org.apache.spark.deploy.SparkSubmit$.prepareSubmitEnvironment(SparkSubmit.scala:296)
    at org.apache.spark.deploy.SparkSubmit$.submit(SparkSubmit.scala:160)
    at org.apache.spark.deploy.SparkSubmit$.main(SparkSubmit.scala:126)
    at org.apache.spark.deploy.SparkSubmit.main(SparkSubmit.scala)
Ray.R.Chua
  • 777
  • 3
  • 8
  • 27

2 Answers2

1

You are submitting the job with --packages org.apache.hadoop:hadoop-aws:2.7.1 option and job is attempting to resolve the dependencies by downloading the packages from public maven repo. However, this error indicates it's unable to reach the maven repo.

Server access error at url https://repo1.maven.org/maven2/org/apache/hadoop/hadoop-aws/2.7.1/hadoop-aws-2.7.1.pom (java.net.NoRouteToHostException: No route to host (Host unreachable))

You might want to check if the spark master has access to internet.

  • unfortunately I cannot allow my spark master to have access to the internet. Do you know how I can overcome this problem? – Ray.R.Chua Jan 23 '17 at 13:41
  • spark-submit accepts another option `--jars` which accepts 3rd party jar location on your driver. Every jar provided by this option, will be transferred to workers. – research800 Feb 08 '17 at 01:36
0

Sorry everyone, it was just some local proxy issues.

Ray.R.Chua
  • 777
  • 3
  • 8
  • 27
  • 1
    I thin you have to accept the above answer as it is pointing to same proxy issue, rather creating new answer and accepting your own... – Ram Ghadiyaram Feb 07 '17 at 07:39