1

I'm trying to convert a binary file that contains multiple images inside to a pdf doc using java, using itextpdf was the only solutions that I get the converted file in the correct format, but the issue here is that on the output it provide me only one image(the first one), and lost the other ones that are inside the binary file.

I've already prove to use itextpdf in order to add the images in a document also some other solutions like this one : https://www.mkyong.com/java/how-to-convert-array-of-bytes-into-file/ or
create pdf from binary data in java

As I understand the issue in my case is that I've read my binary file and store them on a byte[] and after I've pass the content of the file to a Vector,

I've create a function that get as argument Vector and create a pdf with the images inside, the issue is that it insert only the first image on the pdf, because it can not separate inside the Vector the end of the first image and the start of the second image like in this case (JPEG image files begin with FF D8 and end with FF D9.) :

How to identify contents of a byte[] is a jpeg?

File imgFront = new File("C:/Users/binaryFile");
byte[] fileContent;       

Vector<byte[]> records = new Vector<byte[]>();

try {

    fileContent = Files.readAllBytes(imgFront.toPath());
    records.add(fileContent);  // add the result on Vector<byte[]>

} catch (IOException e1) {
    System.out.println( e1 );
}

...

 public static String ImageToPDF(Vector<byte[]> imageVector, String pathFile) {
        String FileoutputName = pathFile + ".pdf";
        Document document = null;

        try {
            FileOutputStream fos = new FileOutputStream(FileoutputName );
            PdfWriter writer = PdfWriter.getInstance(document, fos);

            writer.open();
            document.open();  

     //loop here the ImageVector in order to get one by one the images, 
     //but I get only the first one 

            for (byte[] img : imageVector) {
                Image image = Image.getInstance(img);

                image.scaleToFit(500, 500); //size

                document.add(image);
            }
            document.close();
            writer.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
        return FileoutputName ;

    }

I expect that in the pdf to have all the images inside, not only one.

Tamir Abutbul
  • 7,301
  • 7
  • 25
  • 53
nike
  • 715
  • 2
  • 8
  • 14
  • Please follow the Java coding guidelines to avoid confusing us. Method names, argument names, field names and variable names should all start with a small letter. When we see a name starting with a capital letter, we think it's a class name. That makes it harder to understand your code. – DodgyCodeException Jan 14 '19 at 17:29
  • How was that binary file created? – DodgyCodeException Jan 14 '19 at 17:33
  • Yes sure, thank you for mentioning that. I'm not sure how this file is created, because they have send me a folder with 100 binary files that some of them have one image inside and some other more than one. I've tried to check the bites output in order to decide what kind of image they are using, png, jpg, tiff ect. I found out the file starts with 77, 77, 42 meaning is a TIFF image http://www.sparkhound.com/blog/detect-image-file-types-through-byte-arrays, and try to make a workaround starting from that point. – nike Jan 15 '19 at 13:40

1 Answers1

3

I've made a workaround for the solution here using the itextpdf library.

First I convert the Binary file to bytes, after use the cast in order to convert the bytes to Integer and define the type of image through Byte Array, http://www.sparkhound.com/blog/detect-image-file-types-through-byte-arrays

I found out that my type was Tiff from the output: var tiff2 = new byte[] { 77, 77, 42 }; // TIFF

I've changed the parameters from Vector imageVector, to ==> byte[] bytes when I pass the array of bytes byte[] fileContent;

byte[] fileContent; 
fileContent = Files.readAllBytes(ImgFront.toPath());

ImageToPDF(fileContent, "C:/Users/Desktop/pdfWithImages");

Now I get the number of pages the the binary file using: int numberOfPages = TiffImage.getNumberOfPages(ra); // From itextpdf

    public static String ImageToPDF(byte[] bytes, String pathFile) {
        String fileName= pathFile + ".pdf";
        Document document = null;

            document = new Document();

        try {
            FileOutputStream fos = new FileOutputStream(fileName);
            PdfWriter writer = PdfWriter.getInstance(document, fos);

            writer.open();
            document.open();

            // Array of bytes we have read from the Binary file
            RandomAccessFileOrArray ra = new RandomAccessFileOrArray(bytes);

            // Get the number of pages the the binary file have inside
            int numberOfPages = TiffImage.getNumberOfPages(ra);

            // Loop through numberOfPages and add them on the document 
            // one by one
            for(int page = 1; page <= numberOfPages; page ++){
                Image image = TiffImage.getTiffImage(new RandomAccessFileOrArray(bytes),page);
                image.scaleAbsolute(500, 500);
                document.add(image);
            }                   

            document.close();
            writer.close();
        } catch (Exception e) {
            e.printStackTrace();
        }
        return fileName;

}

This one works for my case because as I've checked some of the binary files I'm using as source all of them are as TIFF image type, for sure in order to check all the kind of image type need to apply more conditions because this use case is for a particular image type.

nike
  • 715
  • 2
  • 8
  • 14
  • 1
    TIFF files can contain multiple images, so when you convert a TIFF to PDF you should be able to see the multiple images. For other file types, e.g. JPG, it's normally one image per file. Unless the supplier of the binary file has done something such as concatenate multiple JPG images to a single file, in which case you'll have a job trying to search for the 2nd/3rd/4th JPG image header in the byte stream. – DodgyCodeException Jan 16 '19 at 11:37