I am new to Hadoop. I have added Gson API to my MapReducing Program. When I am running the program getting;
Error: java.lang.ClassNotFoundException: com.google.gson.Gson
Can anybody suggest me to how to add Third Party Libraries to Hadoop?
I am new to Hadoop. I have added Gson API to my MapReducing Program. When I am running the program getting;
Error: java.lang.ClassNotFoundException: com.google.gson.Gson
Can anybody suggest me to how to add Third Party Libraries to Hadoop?
Be sure to add any dependencies to both the HADOOP_CLASSPATH
and -libjars
upon submitting a job like in the following examples:
Use the following to add all the jar dependencies from current and lib
directories:
export HADOOP_CLASSPATH=$HADOOP_CLASSPATH:`echo *.jar`:`echo lib/*.jar | sed 's/ /:/g'`
Bear in mind that when starting a job through hadoop jar
you'll need to also pass it the jars of any dependencies through use of -libjars
. I like to use:
hadoop jar <jar> <class> -libjars `echo ./lib/*.jar | sed 's/ /,/g'` [args...]
NOTE: The sed
commands require a different delimiter character; the HADOOP_CLASSPATH
is :
separated and the -libjars
need to be ,
separated.
Add the Jar in HADOOP_CLASSPATH
vi $HADOOP_HOME/etc/hadoop/hadoop-env.sh
Add last line
export HADOOP_CLASSPATH=/root/hadoop/extrajars/java-json.jar:$HADOOP_CLASSPATH
"/root/hadoop/extrajars/java-json.jar" is path on linux box itself and not on HDFS
Restart the hadoop
Command
hadoop classpath
Should show the jar in classpath
Now run MR job as usual
hadoop jar <MR-program jar> <MR Program class> <input dir> <output dir>
It will use the file from as expected.