I have written some binary image data to a Hadoop SequenceFile and would like to write it out as a PNG outside of Hadoop, if possible, using Java.
[Edited] Overview of the data flow: Input files → Generate BufferedImages from input → Convert BufferedImages into binary arrays → Store as SequenceFile in HDFS → Trying to take the SequenceFile outside of HDFS and convert it into PNG.
However, I am not sure of how to locate where the data starts inside the SequenceFile. From what I have seen of the SequenceFile documentation, I can use the sync marker to locate the end of the SequenceFile header, and then use the record length and key length information to find the beginning of the value.
However, I am unsure of how to find where the sync marker is. How would I find where the header's metadata stops and where the sync marker begins and ends? Would it be possible for me to calculate the value of the sync marker and look for it that way? Also, how can I find out the number of bytes the record length and key length take up?
If there are alternative ways of finding the SequenceFile value, please let me know. If it helps, here is a little bit of code that I used to write to the SequenceFile.
baos = new ByteArrayOutputStream();
ImageIO.write(img, "png", baos); //img is a BufferedImage
byte[] imBytes = baos.toByteArray();
baos.write(imBytes);
writer = SequenceFile.createWriter(conf, writer.file(new Path(imgPath)), writer.keyClass(Text.class),writer.valueClass(BytesWritable.class));
writer.append(new Text(imgPath), new BytesWritable(imBytes));
Essentially I took a BufferedImage generated by the program, wrote it to a byte array as a PNG, then wrote it to SequenceFile.
[Edit] I've looked through the SequenceFile source code and there is a function called getSync()
? I think it is private though so I'm not sure how I'd use it.