0

I have written a MapReduce job that takes 3 Command Line Arguments - keyword,input path,output path. It counts the number of times the keyword appears in the input files and outputs the same. I want to create a webpage (maybe using Apache Tomcat) wherein it takes the keyword as input. When I click on submit , it should trigger a MapReduce job and output the results on the webpage itself. How is this possible ?. I have tried all answers on the following links and they don't work

  1. Run MapReduce Job from a web application

  2. Calling a mapreduce job from a simple java program

If it is possible please provide a sample working code . It would be really helpful

Edit: When I tried the 2nd solution in the 2nd link the issue is this enter image description here

Community
  • 1
  • 1
  • 2
    What doesn't work about the links above? They seem to give you everything you need. Much of the code should be familiar to you if you've written a MR job. – Binary Nerd Jun 17 '16 at 05:59
  • 1
    How are you invoking the MapReduce job at present-- using hadoop jar? The links you've provided allow the invocation of a job, but don't display the output file. Where do you intend to run the web server-- on the master? `I have tried all answers on the following links and they don't work` -- where are you stuck? – Jedi Jun 17 '16 at 06:21
  • I have coded the solution in the second link's second answer. The webpage was created and when I click on submit , it showed resource file /CallJobFromServlet not found ,after some tweaks , It started to display the source code CallJobFromServlet on the browser. The first answer by Thomas, is a simple main function that you write for every MapReduce job. I don't see where he is linking it with a webserver – Shashank Mudlapur Jun 17 '16 at 06:36
  • Presently I invoke the job using hadoop jar {Jar File} {Class Name} {Command Line Arguments} . I want to run the server on my local machine .Just like Apache Tomcat which lets you view your webapps at localhost:8080 – Shashank Mudlapur Jun 17 '16 at 06:42
  • I suggest, post the code or issue which is not working... explaining the problems you faced. Instead of that " they don't work" doesnt help... – Ram Ghadiyaram Jun 17 '16 at 11:15

1 Answers1

0

Well MapReduce job is basically designed for batch processing and batch processes run in background and not interactive which in this case you want. But there are few things you can

  1. if you are using java to initialize the job the in driver program boolean success = job.waitForCompletion(true) This will launch the MapReduce job and will wait till mapreduce job is finished. At this point you can collect results from the output directory

  2. You can start pooling the output directory in hdfs after starting the mapreduce job. once mapreduce job is finished then a single file _SUCCESS is created in root of the output directory. so you wait till this file appears in the hdfs output dir which indicate that job has finished. Read all files in output dire, process and show results on webpage

Shahzad

Shahzad Aslam
  • 172
  • 1
  • 12