40

I need to convert pdf to byte array and vice versa.

Can any one help me?

This is how I am converting to byte array

public static byte[] convertDocToByteArray(String sourcePath) {

    byte[] byteArray=null;
    try {
        InputStream inputStream = new FileInputStream(sourcePath);


        String inputStreamToString = inputStream.toString();
        byteArray = inputStreamToString.getBytes();

        inputStream.close();
    } catch (FileNotFoundException e) {
        System.out.println("File Not found"+e);
    } catch (IOException e) {
                System.out.println("IO Ex"+e);
    }
    return byteArray;
}

If I use following code to convert it back to document, pdf is getting created. But it's saying 'Bad Format. Not a pdf'.

public static void convertByteArrayToDoc(byte[] b) {          

    OutputStream out;
    try {       
        out = new FileOutputStream("D:/ABC_XYZ/1.pdf");
        out.close();
        System.out.println("write success");
    }catch (Exception e) {
        System.out.println(e);
    }
Lahiru Ashan
  • 767
  • 9
  • 16

14 Answers14

45

Java 7 introduced Files.readAllBytes(), which can read a PDF into a byte[] like so:

import java.nio.file.Path;
import java.nio.file.Paths;
import java.nio.file.Files;

Path pdfPath = Paths.get("/path/to/file.pdf");
byte[] pdf = Files.readAllBytes(pdfPath);

EDIT:

Thanks Farooque for pointing out: this will work for reading any kind of file, not just PDFs. All files are ultimately just a bunch of bytes, and as such can be read into a byte[].

Chris Clark
  • 1,439
  • 2
  • 17
  • 25
  • Thanks for the import edit @Farooque! What do you mean by "In general it can read a any given file into a byte[]"? – Chris Clark Aug 08 '16 at 20:42
  • 1
    I tested pdf, jpg, gif, png, txt files which works perfectly. Since it supports all type of files, if someone need all types then "In general it can read a any given file into a byte[]" infomation will be helpful – Farooque Aug 09 '16 at 05:03
34

You basically need a helper method to read a stream into memory. This works pretty well:

public static byte[] readFully(InputStream stream) throws IOException
{
    byte[] buffer = new byte[8192];
    ByteArrayOutputStream baos = new ByteArrayOutputStream();

    int bytesRead;
    while ((bytesRead = stream.read(buffer)) != -1)
    {
        baos.write(buffer, 0, bytesRead);
    }
    return baos.toByteArray();
}

Then you'd call it with:

public static byte[] loadFile(String sourcePath) throws IOException
{
    InputStream inputStream = null;
    try 
    {
        inputStream = new FileInputStream(sourcePath);
        return readFully(inputStream);
    } 
    finally
    {
        if (inputStream != null)
        {
            inputStream.close();
        }
    }
}

Don't mix up text and binary data - it only leads to tears.

Jon Skeet
  • 1,421,763
  • 867
  • 9,128
  • 9,194
  • 1
    I guess there needs to be an extra bracket in readFully while statement .. like while ((bytesRead = stream.read(buffer)) != -1) – Vamshi Feb 27 '12 at 06:47
  • @JonSkeet - the size you initialise to is 8192 - how large of a PDF file would this work with? I know this is like asking *"How long is a piece of String"*, but maybe a generic guideline if you know? My PDFs will be up to 20 pages long at a guess. – achAmháin Mar 13 '18 at 11:27
  • 1
    @notyou: That's just a buffer size that isn't enormous, but is large enough to avoid "system call for each byte". It's a reasonable default, basically. – Jon Skeet Mar 13 '18 at 12:48
12

The problem is that you are calling toString() on the InputStream object itself. This will return a String representation of the InputStream object not the actual PDF document.

You want to read the PDF only as bytes as PDF is a binary format. You will then be able to write out that same byte array and it will be a valid PDF as it has not been modified.

e.g. to read a file as bytes

File file = new File(sourcePath);
InputStream inputStream = new FileInputStream(file); 
byte[] bytes = new byte[file.length()];
inputStream.read(bytes);
Sufian
  • 6,405
  • 16
  • 66
  • 120
Mark
  • 28,783
  • 8
  • 63
  • 92
6

You can do it by using Apache Commons IO without worrying about internal details.

Use org.apache.commons.io.FileUtils.readFileToByteArray(File file) which return data of type byte[].

Click here for Javadoc

YvesR
  • 5,922
  • 6
  • 43
  • 70
Narendra
  • 5,635
  • 10
  • 42
  • 54
4

This worked for me. I haven't used any third-party libraries. Just the ones that are shipped with Java.

import java.io.*;
import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;

public class PDFUtility {

public static void main(String[] args) throws IOException {
    /**
     * Converts byte stream into PDF.
     */
    PDFUtility pdfUtility = new PDFUtility();
    byte[] byteStreamPDF = pdfUtility.convertPDFtoByteStream();
    FileOutputStream fileOutputStream = new FileOutputStream("C:\\Users\\aseem\\Desktop\\BlaFolder\\BlaFolder2\\aseempdf.pdf");
    fileOutputStream.write(byteStreamPDF);
    fileOutputStream.close();
    System.out.println("File written successfully");
}

/**
 * Creates PDF to Byte Stream
 *
 * @return
 * @throws IOException
 */
protected byte[] convertPDFtoByteStream() throws IOException {
    Path path = Paths.get("C:\\Users\\aseem\\aaa.pdf");
    return Files.readAllBytes(path);
}

}
Aseem Savio
  • 680
  • 1
  • 7
  • 12
2
public static void main(String[] args) throws FileNotFoundException, IOException {
        File file = new File("java.pdf");

        FileInputStream fis = new FileInputStream(file);
        //System.out.println(file.exists() + "!!");
        //InputStream in = resource.openStream();
        ByteArrayOutputStream bos = new ByteArrayOutputStream();
        byte[] buf = new byte[1024];
        try {
            for (int readNum; (readNum = fis.read(buf)) != -1;) {
                bos.write(buf, 0, readNum); //no doubt here is 0
                //Writes len bytes from the specified byte array starting at offset off to this byte array output stream.
                System.out.println("read " + readNum + " bytes,");
            }
        } catch (IOException ex) {
            Logger.getLogger(genJpeg.class.getName()).log(Level.SEVERE, null, ex);
        }
        byte[] bytes = bos.toByteArray();

        //below is the different part
        File someFile = new File("java2.pdf");
        FileOutputStream fos = new FileOutputStream(someFile);
        fos.write(bytes);
        fos.flush();
        fos.close();
    }
Bacteria
  • 8,406
  • 10
  • 50
  • 67
Samy Nagy
  • 190
  • 1
  • 6
1

Calling toString() on an InputStream doesn't do what you think it does. Even if it did, a PDF contains binary data, so you wouldn't want to convert it to a string first.

What you need to do is read from the stream, write the results into a ByteArrayOutputStream, then convert the ByteArrayOutputStream into an actual byte array by calling toByteArray():

InputStream inputStream = new FileInputStream(sourcePath);
ByteArrayOutputStream outputStream = new ByteArrayOutputStream();

int data;
while( (data = inputStream.read()) >= 0 ) {
    outputStream.write(data);
}

inputStream.close();
return outputStream.toByteArray();
Sufian
  • 6,405
  • 16
  • 66
  • 120
Eric Petroelje
  • 59,820
  • 9
  • 127
  • 177
  • Reading a single byte at a time isn't terribly efficient. Better to copy a block at a time. – Jon Skeet Jul 15 '09 at 12:44
  • @Jon - true, but I was trying to keep ti simple. Also, doesn't FileInputStream do buffering internally anyways that would mitigate that? – Eric Petroelje Jul 15 '09 at 12:45
1

Are'nt you creating the pdf file but not actually writing the byte array back? Therefore you cannot open the PDF.

out = new FileOutputStream("D:/ABC_XYZ/1.pdf");
out.Write(b, 0, b.Length);
out.Position = 0;
out.Close();

This is in addition to correctly reading in the PDF to byte array.

David
  • 15,150
  • 15
  • 61
  • 83
  • out.position=0 ?? I dint get it –  Jul 15 '09 at 12:50
  • this may not have been useful as you are saving it to file but I ran into issues where I was putting the byte array into a MemoryStream object and downloading it to the client. I had to set the Position back to 0 for this to work. – David Jul 15 '09 at 13:12
1

To convert pdf to byteArray :

public byte[] pdfToByte(String filePath)throws JRException {

         File file = new File(<filePath>);
         FileInputStream fileInputStream;
         byte[] data = null;
         byte[] finalData = null;
         ByteArrayOutputStream byteArrayOutputStream = null;

         try {
            fileInputStream = new FileInputStream(file);
            data = new byte[(int)file.length()];
            finalData = new byte[(int)file.length()];
            byteArrayOutputStream = new ByteArrayOutputStream();

            fileInputStream.read(data);
            byteArrayOutputStream.write(data);
            finalData = byteArrayOutputStream.toByteArray();

            fileInputStream.close(); 

        } catch (FileNotFoundException e) {
            LOGGER.info("File not found" + e);
        } catch (IOException e) {
            LOGGER.info("IO exception" + e);
        }

        return finalData;

    }
Riddhi Gohil
  • 1,758
  • 17
  • 17
0

This works for me:

try(InputStream pdfin = new FileInputStream("input.pdf");OutputStream pdfout = new FileOutputStream("output.pdf")){
    byte[] buffer = new byte[1024];
    int bytesRead;
    while((bytesRead = pdfin.read(buffer))!=-1){
        pdfout.write(buffer,0,bytesRead);
    }
}

But Jon's answer doesn't work for me if used in the following way:

try(InputStream pdfin = new FileInputStream("input.pdf");OutputStream pdfout = new FileOutputStream("output.pdf")){

    int k = readFully(pdfin).length;
    System.out.println(k);
}

Outputs zero as length. Why is that ?

Sridhar
  • 2,416
  • 1
  • 26
  • 35
0

None of these worked for us, possibly because our inputstream was bytes from a rest call, and not from a locally hosted pdf file. What worked was using RestAssured to read the PDF as an input stream, and then using Tika pdf reader to parse it and then call the toString() method.

import com.jayway.restassured.RestAssured;
import com.jayway.restassured.response.Response;
import com.jayway.restassured.response.ResponseBody;

import org.apache.tika.exception.TikaException;
import org.apache.tika.metadata.Metadata;
import org.apache.tika.parser.AutoDetectParser;
import org.apache.tika.parser.ParseContext;
import org.apache.tika.sax.BodyContentHandler;
import org.apache.tika.parser.Parser;
import org.xml.sax.ContentHandler;
import org.xml.sax.SAXException;

            InputStream stream = response.asInputStream();
            Parser parser = new AutoDetectParser(); // Should auto-detect!
            ContentHandler handler = new BodyContentHandler();
            Metadata metadata = new Metadata();
            ParseContext context = new ParseContext();

            try {
                parser.parse(stream, handler, metadata, context);
            } finally {
                stream.close();
            }
            for (int i = 0; i < metadata.names().length; i++) {
                String item = metadata.names()[i];
                System.out.println(item + " -- " + metadata.get(item));
            }

            System.out.println("!!Printing pdf content: \n" +handler.toString());
            System.out.println("content type: " + metadata.get(Metadata.CONTENT_TYPE));
Sufian
  • 6,405
  • 16
  • 66
  • 120
HRVHackers
  • 2,793
  • 4
  • 36
  • 38
0

I have implemented similiar behaviour in my Application too without fail. Below is my version of code and it is functional.

    byte[] getFileInBytes(String filename) {
    File file  = new File(filename);
    int length = (int)file.length();
    byte[] bytes = new byte[length];
    try {
        BufferedInputStream reader = new BufferedInputStream(new 
    FileInputStream(file));
    reader.read(bytes, 0, length);
    System.out.println(reader);
    // setFile(bytes);

    } catch (FileNotFoundException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    } catch (IOException e) {
        // TODO Auto-generated catch block
        e.printStackTrace();
    }

    return bytes;
    }
Akash Roy
  • 398
  • 5
  • 11
0
public String encodeFileToBase64Binary(String fileName)
        throws IOException {
        System.out.println("encodeFileToBase64Binary: "+ fileName);
    File file = new File(fileName);
    byte[] bytes = loadFile(file);
    byte[] encoded = Base64.encodeBase64(bytes);
    String encodedString = new String(encoded);
    System.out.println("ARCHIVO B64: "+encodedString);


    return encodedString;
}

@SuppressWarnings("resource")
public static byte[] loadFile(File file) throws IOException {
    InputStream is = new FileInputStream(file);

    long length = file.length();
    if (length > Integer.MAX_VALUE) {
        // File is too large
    }
    byte[] bytes = new byte[(int)length];

    int offset = 0;
    int numRead = 0;
    while (offset < bytes.length
            && (numRead=is.read(bytes, offset, bytes.length-offset)) >= 0) {
        offset += numRead;
    }

    if (offset < bytes.length) {
        throw new IOException("Could not completely read file "+file.getName());
    }

    is.close();
    return bytes;
}
  • I don't think the questioner needs a base64 conversion. He use `toString` just because he doesn't know how to read the file to the bytes. – vipcxj Nov 10 '21 at 02:28
-2

PDFs may contain binary data and chances are it's getting mangled when you do ToString. It seems to me that you want this:

        FileInputStream inputStream = new FileInputStream(sourcePath);

        int numberBytes = inputStream .available();
        byte bytearray[] = new byte[numberBytes];

        inputStream .read(bytearray);
plinth
  • 48,267
  • 11
  • 78
  • 120
  • That's a horrible way of reading data - please don't assume that available() will contain all of the data in a stream. – Jon Skeet Jul 15 '09 at 12:39
  • 1
    @Jon - seconded. available() will (usually) return the number of bytes that can be read immediately without blocking. It has little to do with how much data is actually in the file.. – Eric Petroelje Jul 15 '09 at 12:42