6

I'm trying to write a class that reads HTTP requests and responses and parses them. Since the headers are ordinary text it seemed easiest to read them using a BufferedReader and the readLine method. This obviously won't do for the data body as it may be binary, so I want to switch over to read raw bytes after the headers have been read.

Right now, I'm doing something like this:

InputStream input=socket.getInputStream();
BufferedReader reader=new BufferedReader(new InputStreamReader(input));
BufferedInputStream binstream=new BufferedInputStream(input);

The problem is that the BufferedReader is reading ahead and gobbling up all the binary data from the stream before I have a chance to get at it with the binstream.

Is there a way to prevent it from reading beyond the newline for each call to readLine? Or is there a better way to read single lines of ASCII text followed raw binary data?

Paŭlo Ebermann
  • 73,284
  • 20
  • 146
  • 210
Erin
  • 1,848
  • 15
  • 26
  • According to Oracle's documentation, readLine shouldn't read beyond the newline: http://download.oracle.com/javase/6/docs/api/java/io/BufferedReader.html#readLine%28%29 – Argote Feb 15 '11 at 00:05
  • 3
    @Argote: The BufferedReader itself gives not back anything more than a line, but since it is buffered, it first fills it own buffer before searching for the line breaks - so, the data is already read from the underlying stream. – Paŭlo Ebermann Feb 15 '11 at 00:44
  • @Paŭlo Ebermann Ah, I see, that makes sense. – Argote Feb 15 '11 at 00:45

3 Answers3

5

If you don't want to use a ready HTTP client/server implementation like Konstantin proposed, DataInputStream has a readLine method. It is deprecated since it isn't doing a proper conversion (mostly a direct byte -> char casting conversion), but I think for pure ASCII header lines you should be good.

(You should put a BufferedInputStream under you DataInputStream, since readLine reads each byte individually.)

Paŭlo Ebermann
  • 73,284
  • 20
  • 146
  • 210
4

There is already a class in Java for handling HTTP requests and responses. You should use that instead of trying to parse the response on your own. Parsing HTTP response is more difficult than you think as there are different encoding methods that you have to deal with. It isn't really raw binary data in the response payload. The HttpURLConnection class will parse headers for you and give you InputStream for the payload.

http://download.oracle.com/javase/1.4.2/docs/api/java/net/HttpURLConnection.html

Konstantin Komissarchik
  • 28,879
  • 6
  • 61
  • 61
  • I'm writing my own because, in part of the application, I need to disregard the http.proxyHost setting that is being used in another part. – Erin Feb 15 '11 at 01:16
  • I'd fork an existing implementation rather than starting from scratch, if you cannot find a configuration parameter to do what you need. You shouldn't have any licensing issues doing this with Apache Commons HttpClient as mentioned in another answer. – Konstantin Komissarchik Feb 15 '11 at 01:21
  • Actually, I just noticed that there is a way to force URLConnections to use no proxy. I guess that'll work. – Erin Feb 15 '11 at 02:23
  • 4
    This doesn't answer the question - How to `Read from InputStream in multiple formats?` – AlikElzin-kilaka May 17 '14 at 18:28
2

commons-httpclient might save you a heap of work here.

bmargulies
  • 97,814
  • 39
  • 186
  • 310