I'm using Hadoop 2.6, and I have a cluster of Virtual Machines where I installed my HDFS. I'm trying to remotely read a file in my HDFS through some Java code running on my local, in the basic way, with a BufferedReader
FileSystem fs = null;
String hadoopLocalPath = "/path/to/my/hadoop/local/folder/etc/hadoop";
Configuration hConf = new Configuration();
hConf.addResource(new Path(hadoopLocalPath + File.separator + "core-site.xml"));
hConf.addResource(new Path(hadoopLocalPath + File.separator + "hdfs-site.xml"));
try {
fs = FileSystem.get(URI.create("hdfs://10.0.0.1:54310/"), hConf);
} catch (IOException e1) {
e1.printStackTrace();
System.exit(-1);
}
Path startPath = new Path("/user/myuser/path/to/my/file.txt");
FileStatus[] fileStatus;
try {
fileStatus = fs.listStatus(startPath);
Path[] paths = FileUtil.stat2Paths(fileStatus);
for(Path path : paths) {
BufferedReader br=new BufferedReader(new InputStreamReader(fs.open(path)));
String line = new String();
while ((line = br.readLine()) != null) {
System.out.println(line);
}
br.close();
}
} catch (FileNotFoundException e) {
e.printStackTrace();
} catch (IOException e) {
e.printStackTrace();
}
The program can access correctly the HDFS (no exception are risen). If I ask to list the files and directories via code, it can read them without problems.
Now, the issue is that if I try to read a file (as in the code shown), it gets stuck while reading (in the while), until it rises the BlockMissingException
org.apache.hadoop.hdfs.BlockMissingException: Could not obtain block: BP-2005327120-10.1.1.55-1467731650291:blk_1073741836_1015 file=/user/myuser/path/to/my/file.txt
at org.apache.hadoop.hdfs.DFSInputStream.chooseDataNode(DFSInputStream.java:888)
at org.apache.hadoop.hdfs.DFSInputStream.blockSeekTo(DFSInputStream.java:568)
at org.apache.hadoop.hdfs.DFSInputStream.readWithStrategy(DFSInputStream.java:800)
at org.apache.hadoop.hdfs.DFSInputStream.read(DFSInputStream.java:847)
at java.io.DataInputStream.read(DataInputStream.java:149)
at sun.nio.cs.StreamDecoder.readBytes(StreamDecoder.java:284)
at sun.nio.cs.StreamDecoder.implRead(StreamDecoder.java:326)
at sun.nio.cs.StreamDecoder.read(StreamDecoder.java:178)
at java.io.InputStreamReader.read(InputStreamReader.java:184)
at java.io.BufferedReader.fill(BufferedReader.java:161)
at java.io.BufferedReader.readLine(BufferedReader.java:324)
at java.io.BufferedReader.readLine(BufferedReader.java:389)
at uk.ou.kmi.med.datoolkit.tests.access.HDFSAccessTest.main(HDFSAccessTest.java:55)
What I already know:
- I tried the same code directly on the machine running the namenode, and it works perfectly
- I already checked the log of the namenode, and added the user of my local machine to the group managing the HDFS (as suggested by this thread, and other related threads)
- There should not be issues with fully-qualified domain names, as suggested by this thread, as I'm using static IPs. On the other hand, the "Your cluster runs in a VM and its virtualized network access to the client is blocked" can be an option. I would say that if it is like that, it shouldn't allow me to do any action on the HDFS (see next point)
- The cluster run on a network with a firewall, and I have correctly open and forwarded the port 54310 (I can access the HDFS for other purposes, as creating files, directories, and listing their content). I wonder if there are other ports to open needed for file reading