2

I am trying/experimenting with doing spark-submit application-jar where the application-jar is actually hosted in a remote repo (not local or HDFS or S3) Below is my example trying to run SparkPi directly from Maven:

spark-submit \
    --class org.apache.examples.SparkPi \
    --repositories https://mvnrepository.com/repos/central,https://repo.eclipse.org/content/repositories/paho-releases \
    --packages org.apache.spark:spark-examples_2.10:0.9.0-incubating \
    --jars https://repo1.maven.org/maven2/org/apache/spark/spark-examples_2.10/0.9.0-incubating/spark-examples_2.10-0.9.0-incubating.jar \
    spark-examples_2.10-0.9.0-incubating.jar \
    10000

It doesn't seem to work nor fail, here is the output:

---------------------------------------------------------------------
|                  |            modules            ||   artifacts   |
|       conf       | number| search|dwnlded|evicted|| number|dwnlded|
---------------------------------------------------------------------
|      default     |  123  |   0   |   0   |   21  ||  102  |   0   |
---------------------------------------------------------------------

:: retrieving :: org.apache.spark#spark-submit-parent-a0c4af8a-2537-45f2-a26d-d9d697abfb2b


confs: [default]
    0 artifacts copied, 102 already retrieved (0kB/50ms)
20/07/17 09:53:35 WARN Utils: Your hostname, ****.local resolves to a loopback address: 127.0.0.1; using **** instead (on interface en0)
20/07/17 09:53:35 WARN Utils: Set SPARK_LOCAL_IP if you need to bind to another address
20/07/17 09:53:36 WARN NativeCodeLoader: Unable to load native-hadoop library for your platform... using builtin-java classes where applicable
log4j:WARN No appenders could be found for logger (org.apache.spark.deploy.SparkSubmit$$anon$2).
log4j:WARN Please initialize the log4j system properly.
log4j:WARN See http://logging.apache.org/log4j/1.2/faq.html#noconfig for more info.

I maybe wrong in my assumption that it should work, but appreciate any feedback

Koedlt
  • 4,286
  • 8
  • 15
  • 33
Omdax
  • 21
  • 2

1 Answers1

0

Yes you can! You do not need any of the --jars, --repositories or --packages config flags for it either. Simply supplying the full https URI for the application jar is enough. In your case, it would be something like:

./spark-submit \
    --class org.apache.examples.SparkPi \
    https://repo1.maven.org/maven2/org/apache/spark/spark-examples_2.10/0.9.0-incubating/spark-examples_2.10-0.9.0-incubating.jar \
    10000

There is one problem in here though: in the jar you are referencing, org.apache.examples.SparkPi does not exist. However, org.apache.spark.examples.SparkPi does exist!

Here you can find an example that works completely on my laptop right now (using Spark 3.3.1 locally):

./spark-submit \
    --class org.apache.spark.examples.SparkPi \
    https://repo1.maven.org/maven2/org/apache/spark/spark-examples_2.10/1.1.1/spark-examples_2.10-1.1.1.jar \
    10000

I changed the spark-examples_2.10 version, but the idea is the same: just supplying the URI for the application jar should be enough.

Koedlt
  • 4,286
  • 8
  • 15
  • 33
  • Is there a way to pass artifactory credentials in URL? – Bryn Mar 31 '23 at 11:31
  • Not sure about this because I've never added credentials in the URL but you could try [this](https://stackoverflow.com/questions/56115470/specify-user-name-and-passoword-in-jfrog-artifactory-url-in-order-to-avoid-the-p)? Let me know if it works, then I'll add it to this answer! – Koedlt Apr 01 '23 at 07:40
  • thank you! URLs like https://user:pass@host/path somehow do not work with jfrog. – Bryn Apr 03 '23 at 10:22
  • Ohh ok so that link I sent you wasn't helpful? – Koedlt Apr 04 '23 at 12:52
  • 1
    anyway, thank you for the link, the artifactory instance is somehow giving not authorizing against link with creds in URL, while header does work out. – Bryn Apr 05 '23 at 13:04