HTTP seems like a crazy simple protocol but it is not; you should use an HTTP client library such as the built-in java.net.http
client.
The problem is that the concept of 'give me my data, then close it down' is HTTP/1.0, and that's a few decades out of date. HTTP/2.0 and HTTP/3.0 are binary protocols, and HTTP/1.1 tends to leave the connection open. In general, 'read lines', and even 'use Reader' (as in, read characters instead of bytes) is the wrong way to go about it, as HTTP is not a textual protocol. I know. It looks like one. It's not.
Here is a highly oversimplified overview of how e.g. a browser reads HTTP/1.1 responses:
- Use raw byte processing because HTTP body content is raw (or can be), therefore wrapping the whole thing into e.g. an
InputStreamReader
or BufferedReader
is a non-starter.
- Keep reading until an 0x0A byte (in ASCII, the newline symbol), or X bytes have been read and your buffer for this is full, where X is not extraordinarily large. Wouldn't want a badly behaving server or a misunderstanding where you connect to a different (non-HTTP) service to cause a memory issue! Parse this first line as an HTTP/1.1 response.
- Keep doing this loop to pick up all headers. Use the same 'my buffer has limits' trick to avoid memory issues.
- Then check the response code in order to figure out if a body will be forthcoming. It's HTTP/1.1, so you can't just go: "Well, if the connection is closed, I guess no body is forthcoming". Whether one will be coming or not depends primarily on the response code.
- Assuming a body exists, read the double-newline that separates headers from the body.
- If the content is transfered as chunked encoding (common), start blitting data into a buffer, but check if you read the entire chunk. Reading chunked encoding is its own game, really.
- Alternatively, HTTP/1.1 DEMANDS that if chunked encoding isn't used that
Content-Length
is present. Use this header to know precisely how many bytes to read.
- Neither 'a newline' nor 'close connection' can ever serve as a meaningful marker of 'end of data' in HTTP/1.1, so, don't.
- Then either pass the content+headers+returncode verbatim to the requesting code, or dress it up a bit. For example, if the
Content-Type
header is present and has value text/html; encoding=UTF-8
you can consider taking the body data and turning it into a string via UTF-8 (new String(byteArray, StandardCharsets.UTF_8);
).
Note that I've passed right over some bizarre behaviour that servers do because in ye olden days some dumb browser did weird things and it's now the status quo (for example, range requests are quite bizarre) and there's of course HTTP2 and HTTP3 which are completely different protocols.
Also, of course, HTTP servers are rare these days; HTTPS is where its at, and that's quite different too.