1

I am trying to run the wordcount problem of hadoop, as it is for the first time I am working with it. I have followed the instructions using videos and even read many things before running the program.
But still I encountered an exception while running hadoop. Here is the exception, I got:

aims@aims:~/hadoop/hadoop$ bin/hadoop jar '/home/aims/Desktop/WordCount.jar' wordcount /usr/hadoop/input /usr/hadoop/output
16/11/15 11:29:03 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/11/15 11:29:03 INFO client.RMProxy: Connecting to ResourceManager at /0.0.0.0:8032
16/11/15 11:29:06 INFO mapreduce.JobSubmitter: Cleaning up the staging area /tmp/hadoop-yarn/staging/aims/.staging/job_1479184145300_0003
Exception in thread "main" org.apache.hadoop.mapred.InvalidInputException: Input path does not exist: hdfs://localhost:9000/user/aims/wordcount
    at org.apache.hadoop.mapred.FileInputFormat.singleThreadedListStatus(FileInputFormat.java:287)
    at org.apache.hadoop.mapred.FileInputFormat.listStatus(FileInputFormat.java:229)
    at org.apache.hadoop.mapred.FileInputFormat.getSplits(FileInputFormat.java:315)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeOldSplits(JobSubmitter.java:328)
    at org.apache.hadoop.mapreduce.JobSubmitter.writeSplits(JobSubmitter.java:320)
    at org.apache.hadoop.mapreduce.JobSubmitter.submitJobInternal(JobSubmitter.java:196)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1290)
    at org.apache.hadoop.mapreduce.Job$10.run(Job.java:1287)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapreduce.Job.submit(Job.java:1287)
    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:575)
    at org.apache.hadoop.mapred.JobClient$1.run(JobClient.java:570)
    at java.security.AccessController.doPrivileged(Native Method)
    at javax.security.auth.Subject.doAs(Subject.java:422)
    at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1698)
    at org.apache.hadoop.mapred.JobClient.submitJobInternal(JobClient.java:570)
    at org.apache.hadoop.mapred.JobClient.submitJob(JobClient.java:561)
    at org.apache.hadoop.mapred.JobClient.runJob(JobClient.java:870)
    at WordCount.run(WordCount.java:29)
    at org.apache.hadoop.util.ToolRunner.run(ToolRunner.java:70)
    at WordCount.main(WordCount.java:36)
    at sun.reflect.NativeMethodAccessorImpl.invoke0(Native Method)
    at sun.reflect.NativeMethodAccessorImpl.invoke(NativeMethodAccessorImpl.java:62)
    at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43)
    at java.lang.reflect.Method.invoke(Method.java:498)
    at org.apache.hadoop.util.RunJar.run(RunJar.java:221)
    at org.apache.hadoop.util.RunJar.main(RunJar.java:136)

Now I am not understanding how to resolve this. I have tried every link on Internet related to it but no use.
I am currently using Ubuntu 16.04 OS and Hadoop 2.7.3
My Java Version is:

openjdk version "1.8.0_111"
OpenJDK Runtime Environment (build 1.8.0_111-8u111-b14-2ubuntu0.16.04.2-b14)
OpenJDK 64-Bit Server VM (build 25.111-b14, mixed mode)

Hope to hear a solution for this exception.

Jaffer Wilson
  • 7,029
  • 10
  • 62
  • 139

2 Answers2

1

Well, file doesn't exist...

hdfs://localhost:9000/user/aims/wordcount

One of the hdfs-site or core-site XML, sets the HDFS path, and if you've not edited anything, then, there is no /user directory in the root of your box. There is a /home/aims directory.

According to your command, you are trying to read input from the HDFS path /usr/hadoop/input, but the error says /user/aims/wordcount, which means the problem is the input directory you've specified in the mapreduce code is wrong.

OneCricketeer
  • 179,855
  • 19
  • 132
  • 245
  • Thank you for your reply, my friend. Yeah, I haven't made any changes to the file. Actually, I have installed Hadoop using installer available at bitnami website and I thought the installer makes all the necessary configuration required for running command appropriately. Please let me know is there any solution for it. If there is please edit the answer for me. I will be delighted to hear from you. Thankx – Jaffer Wilson Nov 15 '16 at 06:30
  • If you did use Ambari, then you should not have entered "localhost" in the first section node selector. Use the full hostname. All ports should be open in the VM, etc. network issue. If all else fails, download the Cloudera quickstart VM, or Hortonworks sandbox – OneCricketeer Nov 15 '16 at 09:21
  • No, I am not using Ambari. I am just a new user and have installed Hadoop using installer from bitnami. And I am trying to run the program from terminal. Hope this help you. – Jaffer Wilson Nov 15 '16 at 10:37
  • Oh, never used bitnami... Ambari would help you create a cluster, though... Anyways, my answer still stands. Your mapreduce code is executing on the wrong HDFS path. Since I can't see that code I don't know... you specified `/usr/hadoop/input`, but the error clearly thinks your input directory is `/user/aims/wordcount` or, just `wordcount/` since you ran that as the aims user – OneCricketeer Nov 15 '16 at 12:26
  • Thank you for your reply. It helped me and considering your inputs I found the solution. Thank you and keep contributing to my questions, if any arises in future. I like the way of your answering. :) – Jaffer Wilson Nov 16 '16 at 05:16
0

I got the answer.

aims@aims:~/hadoop/hadoop$ bin/hadoop jar '/home/aims/Desktop/WordCount.jar' wordcount /usr/hadoop/input /usr/hadoop/output

What I mentioned above is a wrong syntax to execute the jar on hadoop. As I was a bit suspicious about the wordcount I used in the command. And even the exception was rising using that only. So I removed it and used the directories of Hadoop. And it ran.
The following is the proper way to execute:

aims@aims:~/hadoop/hadoop$ bin/hadoop jar '/home/aims/Desktop/WordCount.jar' /myuser/inputdata /myuser/output

So that worked and I got my output in the output folder.

Jaffer Wilson
  • 7,029
  • 10
  • 62
  • 139