I'm kind of new to Hadoop HDFS and quite rusty with Java and I need some help. I'm trying to read a file from HDFS and calculate the MD5 hash of this file. The general Hadoop configuration is as below.
private FSDataInputStream hdfsDIS;
private FileInputStream FinputStream;
private FileSystem hdfs;
private Configuration myConfig;
myConfig.addResource("/HADOOP_HOME/conf/core-site.xml");
myConfig.addResource("/HADOOP_HOME/conf/hdfs-site.xml");
hdfs = FileSystem.get(new URI("hdfs://NodeName:54310"), myConfig);
hdfsDIS = hdfs.open(hdfsFilePath);
The function hdfs.open(hdfsFilePath)
returns an FSDataInputStream
The problem is that i can only get an FSDataInputStream
out of the HDFS, but i'd like to get a FileInputStream
out of it.
The code below performs the hashing part and is adapted from something i found somewhere on StackOverflow (can't seem to find the link to it now).
FileInputStream FinputStream = hdfsDIS; // <---This is where the problem is
MessageDigest md;
try {
md = MessageDigest.getInstance("MD5");
FileChannel channel = FinputStream.getChannel();
ByteBuffer buff = ByteBuffer.allocate(2048);
while(channel.read(buff) != -1){
buff.flip();
md.update(buff);
buff.clear();
}
byte[] hashValue = md.digest();
return toHex(hashValue);
}
catch (NoSuchAlgorithmException e){
return null;
}
catch (IOException e){
return null;
}
The reason why i need a FileInputStream
is because the code that does the hashing uses a FileChannel
which supposedly increases the efficiency of reading the data from the file.
Could someone show me how i could convert the FSDataInputStream
into a FileInputStream