running mapreduce jobs from jsp servlet

Question

I have a simple java program which wraps distcp to copy files over the hadoop clusters. I can run it successfully both from IDE and hadoop cli.

I wanted to have a jsp web application so people could use the web interface in order to interact with my program.

I created a fat jar with all dependencies and deployed it in my web application. Now the problem is that whenever the program wants to submit distcp job it gives the following error:

java.io.IOException: Cannot initialize Cluster. Please check your configuration for mapreduce.framework.name and the correspond server addresses.
at org.apache.hadoop.mapreduce.Cluster.initialize(Cluster.java:143)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:108)
at org.apache.hadoop.mapreduce.Cluster.<init>(Cluster.java:101)
at org.apache.hadoop.tools.DistCp.createMetaFolderPath(DistCp.java:419)
at org.apache.hadoop.tools.DistCp.<init>(DistCp.java:106)
at replication.ReplicationUtils.doCopy(ReplicationUtils.java:127)
at replication.ReplicationUtils.copy(ReplicationUtils.java:77)
at replication.parallel.DistCpTask.run(DistCpTask.java:28)
at java.util.concurrent.Executors$RunnableAdapter.call(Executors.java:511)
at java.util.concurrent.FutureTask.run(FutureTask.java:266)
at java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1142)
at java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:617)
at java.lang.Thread.run(Thread.java:745)

I checked mapreduce.framework.name and it is indeed yarn.

any ideas?

UPDATE1:

After some debugging I found that for the following piece of code:

 Iterable<ClientProtocolProvider> frameworkLoader =
            ServiceLoader.load(ClientProtocolProvider.class);
    for(ClientProtocolProvider cpp: frameworkLoader) {
        System.out.println(cpp.toString());
    }

when I run it locally I get:

org.apache.hadoop.mapred.YarnClientProtocolProvider@7a4f0f29
org.apache.hadoop.mapred.LocalClientProtocolProvider@5fa7e7ff

But when it is run from web server I get:

org.apache.hadoop.mapred.LocalClientProtocolProvider@5fa7e7ff

I cannot still find out why this happens. I have YarnClientProtocolProvider in the fat jar that I deploy in webserver.

UPDATE2:

the uber jar that I create somehow merges all the service provider declarations under META-INF/services directory of the dependency jars and hence the last file which is written there only contains 'org.apache.hadoop.mapred.LocalClientProtocolProvider'.

I am still wondering why when I use

hadoop jar my.jar ....

it recognizes 'org.apache.hadoop.mapred.YarnClientProtocolProvider' although it is not present in the service providers under META-INF/services directory of my.jar.

Now I think the question should be how to create an uber jar which does not merge the service provider entries.

score 0 · Answer 1 · answered Jun 16 '17 at 17:46

0

The problem was unrelated to jsp servlets and I created a new post here which states what was exactly the problem.

answered Jun 16 '17 at 17:46

Ehsan

298
1
3
17

running mapreduce jobs from jsp servlet

1 Answers1