I am using PDFBox to extract the images from my pdf (which contains only jpg's).
Since I will save those images inside my database, I would like to directly convert each image to an inputstream object first without placing the file temporary on my file sysem. I am facing difficulties with this however. I think it has to do because of the use of image.getPDFStream().createInputStream()
as I did in the following example:
while (imageIter.hasNext()) {
String key = (String) imageIter.next();
PDXObjectImage image = (PDXObjectImage) images.get(key);
FileOutputStream output = new FileOutputStream(new File(
"C:\\Users\\Anton\\Documents\\lol\\test.jpg"));
InputStream is = image.getPDStream().createInputStream(); //this gives me a corrupt file
byte[] buffer = new byte[1024];
while (is.read(buffer) > 0) {
output.write(buffer);
}
}
However this works:
while (iter.hasNext()) {
PDPage page = (PDPage) iter.next();
PDResources resources = page.getResources();
Map<String, PDXObject> images = resources.getXObjects();
if (images != null) {
Iterator<?> imageIter = images.keySet().iterator();
while (imageIter.hasNext()) {
String key = (String) imageIter.next();
PDXObjectImage image = (PDXObjectImage) images.get(key);
image.write2file(new File("C:\\Users\\Anton\\Documents\\lol\\test.jpg")); //this works however
}
}
}
Any idea how I can convert each PDXObjectImage (or any other object I can get) to an inputstream?