I want to use the Web-scale Parallel Inference Engine (WebPIE) reasoner over Hadoop platform. I have already implemented Hadoop structure with two Ubuntu virtual machines and it’s functioning well. When I want to use WebPie to do reasoning over RDF files, the process fails due to need of Sequence File format. The WebPIE tutorial mentioned nothing about the Sequence File format as a prerequisite to reasoning in Hadoop. To produce Sequence file format I wrote the following code:
public static void main(String[] args) {
FileInputStream fis = null;
SequenceFile.Writer swriter = null;
try {
Configuration conf = new Configuration();
File outputDirectory = new File("output");
File inputDirectory = new File("input");
File[] files = inputDirectory.listFiles();
for (File inputFile : files) {
//Input
fis = new FileInputStream(inputFile);
byte[] content = new byte[(int) inputFile.length()];
fis.read(content);
Text key = new Text(inputFile.getName());
BytesWritable value = new BytesWritable(content);
//Output
Path outputPath = new Path(outputDirectory.getAbsolutePath()+"/"+inputFile.getName());
FileSystem hdfs = outputPath.getFileSystem(conf);
FSDataOutputStream dos = hdfs.create(outputPath);
swriter = SequenceFile.createWriter(conf, dos, Text.class,
BytesWritable.class, SequenceFile.CompressionType.BLOCK, new DefaultCodec());
swriter.append(key, value);
}
fis.close();
swriter.close();
} catch (IOException e) {
System.out.println(e.getMessage());
}
}
This code produce correct Sequence File format with some RDF files, but doesn't work 100% correctly, and sometimes produces corrupted files. Is there any solution from beginning to avoid this code, and if there isn't, how can I improve this code to work correctly with any RDF file as input?