Best solution for version conflict in Spark program

Question

I have a Spark program which requires several dependencies.

One dependency: a.jar is version of 2.8 on the cluster a_2.8.jar, however, I need to use its version 2.9, a_2.9.jar.

Everytime I launch the program, the spark will automatically load a_2.8.jar from the cluster, instead of load a_2.9.jar, even I have submitted this jar by --jars a_2.9.jar.

I tried to use spark.executor.userClassPathFirst setting, but there is another problem. There is a "secret" jar file, say "b.jar", in my userClassPath that doesn't works with the cluster, and there are so many dependencies, I don't know which jar doesn't works.

To sum up:

If I use cluster default class path, a.jar will conflict.

If I use userClassPathFirst, b.jar will conflict. ( I don't know which b.jar)

I wish someone could advise me, what is the best solution here, to minimize the work.

score 2 · Accepted Answer · edited May 23 '17 at 12:10

2

Uber Jar creation by using shade plugin can be your solution. Uber jar is collecting all the dependent jars in your packaged jar so that we don't have conflict. We can relocate/rename a conflicting jar with shade plugin. There are more advantages. More information can be found here and here

edited May 23 '17 at 12:10

Community

1
1

answered May 05 '17 at 02:18

Ramesh Maharjan

41,071
6
69
97

score 1 · Answer 2 · answered May 04 '17 at 19:32

The best solution is IMO to:

Get the dependency tree with your package manager or any other tool you want use. For example in maven you could use mvn dependency:tree see here to double check which dependencies could potentially cause the class path errors and remove them by excluding them in your build file definition like it is pointed out here.

Then, rebuilding your JAR and try it again.

Best solution for version conflict in Spark program

2 Answers2