Running a Hadoop job from a Java Program

Question

I am writing a distributed system and am facing problem with connecting it to Hadoop. Here's my situation:

1) I have a distributed system running on 3 computers (sys1, sys2, sys3)

2) Sys2 and Sys3 are MasterNodes of two different Hadoop Clusters. These two Hadoop clusters are not connected to each other and each runs independently.

3) My distributed system has three parts (p1, p2, p3).

P1 sits on sys1 and receives the source code of Mappers/Reducers from the client (client is another system). P1 will then contact P2 or P3 and send them the code for mappers/reducers.

4)Now the problemis that P2 or P3 needs to run the job on Hadoop and send back the result to P1.

I have worked with hadoop for a while and know how to write a simple mapReduce program, convert it to a JAR file and execute it on Hadoop. The problem is that in my case the source code of mapReduce job is sent during the execution and I can't make a JAR file out of it. I need to make a hadoop job out of the received code and run it in hadoop. I would appreciate any advice/suggestion on how to solve this problem?

PS. I know one solution is to write the received map/Reduce code to a file on disk, execute all the required commands to make the JAR file and running the job in a shell from within my Java code (using a Runtime instance) and ... but I prefer to be able to directly run the job from my Java code and not go through all the possible troubles of the above solution.

score 1 · Accepted Answer · answered Dec 18 '12 at 11:42

1

What about using the JavaCompiler API ? You could then easily create a jar file on the fly with JarOuptputStream

Here is a nice blog post explaining the API : JavaBeat

answered Dec 18 '12 at 11:42

Grooveek

10,046
1
27
37

1

And after compiling that jar, [this answer](http://stackoverflow.com/a/9850096/698839) should help with running the job through Hadoop's Java API. – Matt D Dec 18 '12 at 15:39
thanks guys. I was finally able to successfully use JavaCompiler API to compile the code on-the-fly and make a JAR file from it. thanks – reza Dec 20 '12 at 11:09

Running a Hadoop job from a Java Program

1 Answers1