3

I am trying to migrate our application to spark running on yarn. I use cmdline as spark-submit --master yarn --deploy-mode cluster -jars ${my_jars}...

But yarn throws Expections with the following log: Container id: container_1462875359170_0171_01_000002 Exit code: 1 Exception message: .../launch_container.sh: line 4145: /bin/bash: Argument list too long

I think the reason may be that we have too many jars (684 jars separated by comma) specified by option --jars ${my_jars}, my question is what is the graceful way to specify all our jars? Or how can we avoid this yarn error?

Lipeng Yang
  • 101
  • 2
  • 10

3 Answers3

4

Check if you can use spark.driver.extraClassPath extraClassPath Spark Documentation

spark.driver.extraClassPath /fullpath/firs.jar:/fullpath/second.jar
spark.executor.extraClassPath /fullpath/firs.jar:/fullpath/second.jar

Just found the threadspark-submit-add-multiple-jars-in-classpath

Community
  • 1
  • 1
Indrajit Swain
  • 1,505
  • 1
  • 15
  • 22
  • Tried configured jars with these two classpath rather than "-jars", it didn't solve the problem, I think this only change how we submit the jars to yarn or how we configure it in Spark side, but the yarn cmdline is unchanged, it's something like: {{JAVA_HOME}}/bin/java -server -Xmx1024m ... --user-class-path my_first.jar --user-class-path my_second.jar ... --user-class-path my_684th.jar, so the cmdline is too long. – Lipeng Yang Dec 14 '16 at 10:35
  • Thanks, I tried again and found it working, my first try had some error in configurations. – Lipeng Yang Dec 15 '16 at 03:14
  • Good to hear its resolved the issue . If you fixed the issue post it so that it can be helpful to others . – Indrajit Swain Dec 15 '16 at 05:13
0

I'd try these two things

  1. Build a fat jar for spark submit application or
  2. Build a thin jar with maven and install unavailable jars in the maven repo. so that it will be available to load at runtime in the cluster.
mrsrinivas
  • 34,112
  • 13
  • 125
  • 125
  • Thanks, our application is private and cannot put to maven repo. Fat jar is an option, but build it is an extra step for us, especially some jars may change frequently. I wonder if there is a configuration or something to avoid this error (googled but not found), I think this is a common use case. – Lipeng Yang Dec 14 '16 at 09:53
  • **our application is private and cannot put to maven repo.** We can maintain own repository with nexus/archive tools for maven. **I wonder if there is a configuration or something to avoid this error (googled but not found)** you can use maven to build fat jar. _some jars may change frequently_ what are the changes version or jar ? – mrsrinivas Dec 14 '16 at 09:59
  • Thanks, but maintaining a repository only to solve this is a little weird. The changes is versions, e.g. we code, compile some new version of jars, and do a test run. Each time we build a fat jar is not accepted, but want to avoid it if I can. That's my last hope. – Lipeng Yang Dec 14 '16 at 10:39
  • In general most of the enterprises either maintain own repository or build far jars. – mrsrinivas Dec 15 '16 at 03:13
0

Try sbt-assembly which packages all your classes and dependency classes into an uber jar.

It is very easy and comfortable to use, but you have to take care of two things:

  1. version conflict
  2. the jar would be a little bit large
Mo Tao
  • 1,225
  • 1
  • 8
  • 17