1

I am in Classloader hell - Hadoop (up to 2.7.2) uses an out-dated version of HttpClient (4.2.5)

https://hadoop.apache.org/docs/r2.7.2/hadoop-mapreduce-client/hadoop-mapreduce-client-core/dependency-analysis.html

This is clashing with the version of HttpClient I am using 4.5.1. I have tried to load the User classpath first in my EMR job but then I get a clash on Codec classes. I even rewrote the class to use the older version (4.2.5) but am still getting some clashes.

In my EMR job how can I print the full classpath the StdOut/StdErr or somewhere else so I can debug which Jars are in the classpath?

I know how to get a "normal" Java classpath but wondering if there is any aspect that might be Hadoop and/or EMR-specific to include Hadoop/EMR jars also.

kellyfj
  • 6,586
  • 12
  • 45
  • 66

1 Answers1

0

Here is the process I used using the System Classloader and I added it to my Hadoop Driver class

 public static void logClasspathToStdOut() {
    try {
      ClassLoader cl = ClassLoader.getSystemClassLoader();

      URL[] urls = ((URLClassLoader)cl).getURLs();

      int i = 1;
      System.out.println("SystemClassLoader classpath includes:");
      for (URL url : urls) {
        System.out.println(i + " : " + url.getFile());
        i++;
      }
    } catch(Exception e) {
      System.err.println("Exception logging classpath " + e.getMessage());
    }
  }

(read about the difference between the classloader types here Difference between thread's context class loader and normal classloader)

And I got the following output on the 'stdout' of EMR

SystemClassLoader classpath includes:
1 : /home/hadoop/.versions/2.4.0/etc/hadoop/
2 : /home/hadoop/.versions/2.4.0/share/hadoop/common/lib/httpclient-4.2.5.jar
3 : /usr/share/aws/emr/kinesis/lib/EmrKinesisHadoop-1.0.1.jar
.
.
354 : /usr/share/aws/emr/lib/gson-2.2.2.jar
355 : /usr/share/aws/emr/lib/commons-httpclient-3.0.jar
Community
  • 1
  • 1
kellyfj
  • 6,586
  • 12
  • 45
  • 66