I'm trying to read the contents of a file from HDFS. My code is below -
package gen;
import java.io.BufferedReader;
import java.io.IOException;
import java.io.InputStreamReader;
import org.apache.hadoop.conf.Configuration;
import org.apache.hadoop.fs.FileSystem;
import org.apache.hadoop.fs.Path;
public class ReadFromHDFS {
public static void main(String[] args) throws Exception {
if (args.length < 1) {
System.out.println("Usage: ReadFromHDFS <hdfs-file-path-to-read-from>");
System.out.println("Example: ReadFromHDFS 'hdfs:/localhost:9000/myFirstSelfWriteFile'");
System.exit(-1);
}
try {
Path path = new Path(args[0]);
FileSystem fileSystem = FileSystem.get(new Configuration());
BufferedReader bufferedReader = new BufferedReader(new InputStreamReader(fileSystem.open(path)));
String line = bufferedReader.readLine();
while (line != null) {
System.out.println(line);
line = bufferedReader.readLine();
}
} catch (IOException e) {
e.printStackTrace();
}
}
}
However, I can't figure out how to give this program the path to my HDFS directory. I have tried -
java -cp <hadoop jar:myjar> gen.ReadFromHDFS <path>
where with path I tried referencing the directory directly (what I see when I do hadoop fs -ls), the file inside the directory, adding hdfs:/localhost, hdfs:/ and none of them work. Can any one help me with how exactly I should pass the path of my folder to HDFS? For example, when I give the path directly (with no prefix) it says that the file does not exist.
Edit: None of the solutions so far seem to work for me. I always get the exception -
java.io.FileNotFoundExceptoin: File <filename> does not exist.
at org.apache.hadoop.fs.getFileSystem.getFileStatus(RawLocalFileSystem.java:361)
It seems to be trying to find the file locally.