Run MapReduce Job from a web application

Question

With reference to similar questions: Running a Hadoop Job From another Java Program and Calling a mapreduce job from a simple java program

I too have a mapreduce job jar file in a Hadoop remote machine, and I'm creating a web application that, with a button click event, will call out to the jar file and execute the job. This web app is running on a separate machine.

I've tried the suggestions from both of the posts above but could not get it to work, even working on the wordcount example provided, but still encountering the error message NoClassDefFoundError.

Is there any lines of code I'm missing?

Below is the code i have:

public void buttonClick(ClickEvent event) {
        UserGroupInformation ugi;
        try {
            ugi = UserGroupInformation.createProxyUser("hadoopUser", UserGroupInformation.getLoginUser());
            ugi.doAs(new PrivilegedExceptionAction<Object>(){
                public Object run() throws Exception {
                    runHadoopJob();
                    return null;
                }
            });
        } catch (IOException e) {
            e.printStackTrace();
        } catch (InterruptedException e) {
            e.printStackTrace();
        }   
    }

private boolean runHadoopJob(){
try {       
            Configuration conf = new Configuration();
            conf.set("fs.default.name", "hdfs://192.168.4.248:9000");
            conf.set("mapred.job.tracker", "192.168.4.248:9001");
            Job job = new Job(conf, "WordCount");
            job.setMapperClass(TokenizerMapper.class);
            job.setReducerClass(IntSumReducer.class);
            job.setJarByClass(TokenizerMapper.class);
            job.setOutputKeyClass(Text.class);
            job.setOutputValueClass(IntWritable.class);
            FileInputFormat.addInputPath(job, new Path("/flume/events/160114/*"));
            Path out = new Path("output");
            FileSystem fs = FileSystem.get(conf);
            fs.delete(out, true);
            FileOutputFormat.setOutputPath(job, out);
            job.waitForCompletion(true);
            System.out.println("Job Finished");
        } catch (Exception e) {
            e.printStackTrace();
        }
return true;
}

Caused by: java.lang.NoClassDefFoundError: org/codehaus/jackson/map/JsonMappingException at org.apache.hadoop.mapreduce.Job$1.run(Job.java:513) at java.security.AccessController.doPrivileged(Native Method) at javax.security.auth.Subject.doAs(Unknown Source) at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1149) at org.apache.hadoop.mapreduce.Job.connect(Job.java:511) at org.apache.hadoop.mapreduce.Job.submit(Job.java:499) at org.apache.hadoop.mapreduce.Job.waitForCompletion(Job.java:530) at com.example.hadoopclient.HDFSTable.runHadoopJob(HDFSTable.java:181) at com.example.hadoopclient.HDFSTable.access$0(HDFSTable.java:120) at com.example.hadoopclient.HDFSTable$SearchButtonClickListener.buttonClick(HDFSTable.java:116) at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method) at sun.reflect.NativeMethodAccessorImpl.invoke(Unknown Source) at sun.reflect.DelegatingMethodAccessorImpl.invoke(Unknown Source) at java.lang.reflect.Method.invoke(Unknown Source) at com.vaadin.event.ListenerMethod.receiveEvent(ListenerMethod.java:510) ... 36 more

Added the following to my hadoop core-site.xml file, where hadoop is the usergroup my hadoopUser belongs to

<property>
           <name>hadoop.proxyuser.kohtianan.groups</name>
           <value>hadoop</value>
           <description></description>
         </property>
         <property>
           <name>hadoop.proxyuser.kohtianan.hosts</name>
           <value>*</value>
           <description></description>
         </property>

Ankur Shanbhag · Accepted Answer · 2014-02-10T05:54:18.473

1

For map-reduce program to run, you need to have jackson-mapper-asl-*.jar and jackson-core-asl-*.jar files present on your map-reduce program class-path. The actual jar file names will vary based on the hadoop distribution and version you are using.

These files are present under $HADOOP_HOME/lib folder. Two ways to solve this problem:

Invoke map-reduce program using hadoop jar command. This will ensure that all the required jar files are automatically included in your map-reduce program's class-path.
If you wish to trigger a map-reduce job from your application, make sure you include these jar files (and other necessary jar files) in your application class-path, so that when you spawn a map-reduce program it automatically picks up the jar files from the application class-path.

org.apache.hadoop.ipc.RemoteException: User: kohtianan is not allowed to impersonate hadoopUser

This error indicates that the user kohtianan does not have access to Hadoop DFS. What you can do is, just create a directory on HDFS (from hdfs superuser) and change the owner of that directory to kohtianan. This should resolve your issue.

edited Feb 10 '14 at 05:54

answered Jan 17 '14 at 07:37

Ankur Shanbhag

7,746
2
28
38

Shandbhag thank you. It seems to have resolved the issue. However, when I've encountered another error, which involves the remote user rights issue when the job has to write/delete files on the HDFS. I've edited the portion of the code which calls the 'runHadoopJob' method with the UserGroupInformation class but can't seem to get it to work. Can you help me identify where i've implemented wrongly? Thank you – Koh Jan 20 '14 at 01:40
`org.apache.hadoop.ipc.RemoteException: User: kohtianan is not allowed to impersonate hadoopUser` This is the error message i got. – Koh Jan 20 '14 at 01:41
@Koh : I have modified the post. Have a look. – Ankur Shanbhag Jan 20 '14 at 05:51
1

Thank you. I went to look around somemore, and realised I've missed out some edits to the core.xml config files. I've added the following and seems like its resolved – Koh Jan 20 '14 at 09:19

Run MapReduce Job from a web application

1 Answers1

Linked