1

I have a runtime problem with a piece of code I'm running on top of Apache Spark. I depend on the AWS SDK to upload files to S3 - and this is erroring out with a NoSuchMethodError. It is worthwhile to note that I'm using an uber jar with the Spark dependency bundled in. Error when running my code:

Exception in thread "main" java.lang.NoSuchMethodError: org.apache.http.impl.conn.DefaultClientConnectionOperator.<init>(Lorg/apache/http/conn/scheme/SchemeRegistry;Lorg/apache/http/conn/DnsResolver;)V
at org.apache.http.impl.conn.PoolingClientConnectionManager.createConnectionOperator(PoolingClientConnectionManager.java:140)
at org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:114)
at org.apache.http.impl.conn.PoolingClientConnectionManager.<init>(PoolingClientConnectionManager.java:99)
at com.amazonaws.http.ConnectionManagerFactory.createPoolingClientConnManager(ConnectionManagerFactory.java:29)
at com.amazonaws.http.HttpClientFactory.createHttpClient(HttpClientFactory.java:97)
at com.amazonaws.http.AmazonHttpClient.<init>(AmazonHttpClient.java:165)
at com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:119)
at com.amazonaws.AmazonWebServiceClient.<init>(AmazonWebServiceClient.java:103)
at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:357)
at com.amazonaws.services.s3.AmazonS3Client.<init>(AmazonS3Client.java:339)

However, when I inspect the jar for the method signature, I see it clearly listed:

vagrant@mesos:~/installs/spark-1.0.1-bin-hadoop2$ javap -classpath /tmp/rickshaw-spark-0.0.1-SNAPSHOT.jar org.apache.http.impl.conn.DefaultClientConnectionOperator
Compiled from "DefaultClientConnectionOperator.java"
public class org.apache.http.impl.conn.DefaultClientConnectionOperator implements     org.apache.http.conn.ClientConnectionOperator {
protected final org.apache.http.conn.scheme.SchemeRegistry schemeRegistry;
protected final org.apache.http.conn.DnsResolver dnsResolver;
public  org.apache.http.impl.conn.DefaultClientConnectionOperator(org.apache.http.conn.scheme.SchemeRegistry);
public org.apache.http.impl.conn.DefaultClientConnectionOperator(org.apache.http.conn.scheme.SchemeRegistry, org.apache.http.conn.DnsResolver); <-- it exists!
public org.apache.http.conn.OperatedClientConnection createConnection();
public void openConnection(org.apache.http.conn.OperatedClientConnection, org.apache.http.HttpHost, java.net.InetAddress, org.apache.http.protocol.HttpContext, org.apache.http.params.HttpParams) throws java.io.IOException;
public void updateSecureConnection(org.apache.http.conn.OperatedClientConnection, org.apache.http.HttpHost, org.apache.http.protocol.HttpContext, org.apache.http.params.HttpParams) throws java.io.IOException;
protected void prepareSocket(java.net.Socket, org.apache.http.protocol.HttpContext, org.apache.http.params.HttpParams) throws java.io.IOException;
protected java.net.InetAddress[] resolveHostname(java.lang.String) throws java.net.UnknownHostException;

}

I checked some of the other jars in the spark distribution - they don't seem have this particular method signature. So I'm left wondering what is being picked up by the Spark runtime to cause this issue. The jar is built on a maven project where I lined up the dependencies to ensure the correct aws java sdk dependency was being picked up as well.

user3537867
  • 11
  • 1
  • 3
  • 1
    Looks like you are looking at `PoolingClientConnectionManager` with `javap`, but the missing method is on `DefaultClientConnectionOperator`, no? – Daniel Darabos Jul 16 '14 at 21:54
  • Mea culpa. I updated the javap output with the correct class in my original post. It exists as well - so the problem still remains. – user3537867 Jul 18 '14 at 18:24

1 Answers1

2

The Spark 1.0.x distribution already contains an incompatible version of DefaultClientConnectionOperator and there is not easy way to replace it.

The only workaround I've found is including a custom implementation of PoolingClientConnectionManager to avoid using the missing constructor.

Replacing:

return new DefaultClientConnectionOperator(schreg, this.dnsResolver);

for:

return new DefaultClientConnectionOperator(schreg);

You need to be sure, your class is going to be included:

case PathList("org", "apache", "http", "impl", xs @ _*) => MergeStrategy.first

Custom PoolingClientConnectionManager: https://gist.github.com/felixgborrego/568f3460d82d9c12e23c

Felix
  • 2,705
  • 1
  • 23
  • 14