2

In Java network inputstream why is it not recommended that this be done :

 PdfReader localPdfReader = new PdfReader(item.getinputStream());

Where item is a FileItem uploaded from a client. Instead this is recommended:

ByteArrayOutputStream output = new ByteArrayOutputStream();
IOUtils.copy(item.getInputStream(), output);

What is the diference between the two?
EDIT
I came to know that it is because inputstream here is a network stream. So once a byte is read you cannot read it anymore. So you have to copy it to memory(IOUtils) before using it. My question is while copying to memory also you have to read the stream byte by byte right? then why doesn't the inputstream get closed there?

posdef
  • 6,498
  • 11
  • 46
  • 94
Ashwin
  • 12,691
  • 31
  • 118
  • 190

3 Answers3

1

The streams in java use decorator pattern. It is used to add a functionality to the streams by wrapping them (decorating). My favourite is using GZipStream around a FileStream, for added compression.

aviad
  • 8,229
  • 9
  • 50
  • 98
1

I guess the difference is of Reader and Inputstream. In your example, a PDF doc is binary data which should not be transferred character by character but byte by byte. Check this link in the same forum for more on Reader and InputStream. Even though it mentioned wrapping of Stream by Reader, as mentioned earlier for binary data this should be discouraged.

EDIT: 1

Lets check the way Reader and InputStream's read method works

Reader.read() returns integer in the range 0 to 65535 (single 16-bit Unicode character)

InputStream.read() returns byte (8-bit signed two's complement integer) of data

Now imagine if you use Reader to read binary data (which is sequence of 8 bit integer), you will end up reading two bytes (8*2) instead of one assuming it to be a character.

I have not seen the code for PdfReader so not sure if it uses java.io.Reader. This explaination is purly for java.io.Reader/InputStream. I would appreciate if you share some link or post which which says the the PdfReader if used in a manner you mentioned, is not good.

EDIT:2

Remember:

  1. From a network, you can read the stream bytes only once.
  2. If you need those bytes for multiple tasks, better store those bytes in array and used the same array multiple time

If you use

PdfReader localPdfReader = new PdfReader(item.getinputStream());

then PdfReader internally reads the bytes from stream and uses it to validate. It does not store it for any further usage.

If you use

IOUtils

It copies the bytes from network to a byte array which later can be used in PdfReader as well as JDBC call to store it in DB.

Community
  • 1
  • 1
Santosh
  • 17,667
  • 4
  • 54
  • 79
  • pls see this link. See from my first comment of the first answer. http://stackoverflow.com/questions/10320062/how-to-store-a-pdf-file-in-postgresql-database-using-servlets - – Ashwin May 09 '12 at 11:14
  • I wish you posted this link earlier. What @BalusC has said is perfectly right. Which part you didn't understand ? – Santosh May 09 '12 at 11:27
  • I did not understand from the 5th comment of the 1st answer. He says that from a network stream only only on byte can be read. – Ashwin May 09 '12 at 11:51
  • Thats true. check out the javadoc of InputStream.read() method. It reads only one byte at a time. – Santosh May 09 '12 at 12:17
  • What is the problem with that? why shouldn't this be done - PDFReader pd=new PDFReader(inputstream)? Instead why should IOUtils be used? Please explain in detail. My understanding of java i/o does not seem to be good. – Ashwin May 10 '12 at 05:55
  • In the link BalusC has said that - " Once a byte is read, it cannot be read anymore". Then how will you even copy it using the IOUtils. Because to copy also you will have to read it and then copy it. – Ashwin May 10 '12 at 09:30
  • The first time you read it, you read it using IOUtils which will copy it as well. – Santosh May 10 '12 at 10:16
  • "The first time you read it, you read it using IOUtils which will copy it as well",when you read it you read it byte by byte right? But according to BalusC "Once a byte is read, it cannot be read anymore" – Ashwin May 10 '12 at 10:19
  • Yes. One byte you read, you store it in an array and then read next byte and store it in the same array. So on and so forth. This way you read each byte once, but store the same in byte array. – Santosh May 10 '12 at 10:27
  • one more question does pdf reader need the bytes more than once to create a pdf file? – Ashwin May 10 '12 at 10:55
  • Nope. It needs the bytes once you pass bytes via standard constructor which takes a byte array. – Santosh May 10 '12 at 12:00
  • Thanks got it:) you can up vote my question if you already haven't:) – Ashwin May 11 '12 at 04:52
  • @Santosh I don't know which `PDFReader` class you're referring to but the ones I found all had constructors that take `InputStreams.` So I don't know what this question is really about. – user207421 May 12 '12 at 00:23
  • @EJP Its the widely used iText library. Check [this link](http://grepcode.com/file/repo1.maven.org/maven2/com.itextpdf/itextpdf/5.1.0/com/itextpdf/text/pdf/PdfReader.java) – Santosh May 12 '12 at 01:52
  • Exactly. So, just as I said, it has a constructor that takes an `InputStream`. So there is no need for the `ByteArrayInputStream` convolution. – user207421 May 12 '12 at 10:02
  • @EJP Well true. But if you see the background of this question, BalusC has [mention](http://stackoverflow.com/questions/10320062/how-to-store-a-pdf-file-in-postgresql-database-using-servlets) this "convolution", and it seems to cause some confusion to Ashwin. Check the link mentioned in the 2nd comment here and then read THIS question, that will give you idea where I am coming from. – Santosh May 12 '12 at 16:33
1

Instead this is recommended

Recommended by whom? Why? I've never seen such a statement in any Oracle or Sun documentation since 1997. OTOH there is a lot of misinformation out there from other sources.

It might be recommended in certain circumstances, i.e. where you have to read the data more than once. These circumstances are very rare.

user207421
  • 305,947
  • 44
  • 307
  • 483
  • you are right. It is recommended when when you have to read the data more than once. got it:) – Ashwin May 12 '12 at 12:21