Adding jars to the classpath of the code that launches map reduce job

Question

I am trying to launch a map reduce job from an application that implements the Tool interface. The application does few other things which are like preconditions for the map reduce job.

This class use some third party libs, How do I add those jars to the classpath while running the jar using the command: hadoop jar < myjar > [args]

From this Cloudera's post I tried to set the HADOOP_CLASSPATH env var to the third party jar, but it did not work out. The third party jars mentioned above are only required by the class that launch the job and not by Mapper/Reducer classes. So I do not need to put them in Distributed Cache.

When I copy these third party jars that I need under $HADOOP_HOME/lib, it works, but I need a cleaner solultion.

Thanks in aniticipation.

Note - I know that putting all the third party jars in a lib directory in my-map-reduce-job.jar jar would work, but I do not have that liberty, the jar gets created using Maven and I want these third party jars outside of my-map-reduce-job.jar

That's when you try to invoke mapred. isnt it? or you getting exception even when you try hadoop fs -ls? — SMA, Jan 01 '15 at 13:23
I do not get any exception for any hadoop -fs command, I get Exception in thread "main" java.lang.NoClassDefFoundError for a class that is in a third party jar. Thanks — Niks, Jan 01 '15 at 14:43
yes so that means your every mapred process would need that third party jar as well. So you need to distribute those jars to other machines as well if you have multi node clusters — SMA, Jan 01 '15 at 15:10
@almasshaikh - I do not need to put these jars in the Distributed Cache as mentioned in the question. Anyways I figured out my mistake - using the HADOOP_CLASSPATH works. Earlier I was using the wrong separator between the jars. The correct seperator is OS dependent, In my case (Unix) it should be colon(:) Thanks. — Niks, Jan 02 '15 at 06:02

score 2 · Accepted Answer · answered Jan 02 '15 at 06:12

For future references - setting env var HADOOP_CLASSPATH on the client machine fron where you are launching the map reduce job is the way to go.

I figured out my mistake, I was exporting the HADOOP_CLASSPATH in wrong way. The seperator between the jars is platform dependent, for Unix, its colon(:)

export HADOOP_CLASSPATH=/path/to/my/jar1:/path/to/my/jar2 and then hadoop jar [mainClass] [args]

You might want to append your jars to the HADOOP_CLASSPATH env var if it has been predefined elsewhere. export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:/path/to/my/jar1:/path/to/my/jar2

Adding jars to the classpath of the code that launches map reduce job

1 Answers1

Linked