0

I'm trying to get the HTTP request from Google Chrome to get it's data. For that I use readLine() from BufferedReader but for some reason I think it gets stuck at the last line because the buffer stays open and it stays waiting for more input. Here is the code that I use in the while loop:

String line;
ArrayList<String> request = new ArrayList<String>();
while ((line = inFromClient.readLine()) != null) {
    request.add(line);
}

If I forcely break the loop it works, basically im trying to get an efficient read of all lines but without the inconsistencies of ready()

Bruno Cotrim
  • 57
  • 1
  • 6
  • 1
    What do you mean it gets stuck on the last line? It gets stuck on `}`? – Spectric Oct 12 '20 at 01:30
  • The last line of an http request its The first line-GET /mp1.html HTTP/1.1 after there is the body , and then after loading all the data it gets stuck at the end of the read like here Accept-Language: pt-PT,pt;q=0.9,en-US;q=0.8,en;q=0.7 -it gets stuck in here, i could use isEmpty to stop the loop the problem is that the body and data are separated by an empty line – Bruno Cotrim Oct 12 '20 at 01:34
  • 2
    I don't know how Google Chrome sends its data to your program, but I would think it probably leaves the stream open and reuses it for all http requests. If I look at the documentation for [`BufferedReader#readLine()`](https://docs.oracle.com/javase/7/docs/api/java/io/BufferedReader.html#readLine()), it says "[returns] null if the end of the stream has been reached." I suspect your loop isn't really "stuck," it's just waiting for more data. Just because you read all available data doesn't mean you've reached the end of the stream. – Charlie Armstrong Oct 12 '20 at 01:42
  • Interesting, maybe thats why with a custom client it works, its because i close the stream, and if that happens with chrome how is it possible to read all the lines until the last if the stream never closes? – Bruno Cotrim Oct 12 '20 at 01:48
  • 1
    This might help: https://stackoverflow.com/questions/5987970/socket-bufferedreader-hangs-at-readline – Hamza Belmellouki Oct 12 '20 at 01:51
  • 1
    You can check [`BufferedReader#ready()`](https://docs.oracle.com/en/java/javase/11/docs/api/java.base/java/io/BufferedReader.html#ready()) instead of the null check. – Charlie Armstrong Oct 12 '20 at 01:56
  • Thanks a lot, that post really helped with the ready,im just finding some inconsistencies with that implementation, i will try to work around it and see where i can go from there – Bruno Cotrim Oct 12 '20 at 01:56
  • 1
    @HamzaBelmellouki - that suggests using `ready()` as the solution ... which is a bad idea. – Stephen C Oct 12 '20 at 02:01
  • 1
    Using `ready()` is not reliable. It is possible that `ready()` will return `false` when there is still data to come, but it is not "here" yet because of a network hiccup. – Stephen C Oct 12 '20 at 02:03
  • 1
    Reading lines until there are no more is simply incorrect. The real solution is to implement enough of the HTTP protocol so that you can detect when you should stop reading lines ... as per the protocol. Or better still use an existing library that implements the server-side protocol. – Stephen C Oct 12 '20 at 02:17
  • 1
    @StephenC, I agree. However, it can prevent blocking the thread running that code if it's waiting for data from a network. If it returns false, maybe he can do something else, then he can check for the data's readiness again. – Hamza Belmellouki Oct 12 '20 at 02:18
  • This is for Http understanding purposes but im trying to push it a bit further by trying to make the code work on every request, im not using libraries because its asked to implement HTTP by hand to get a better understanding of how it works, i tried to find patterns to when to stop reading lines on http requests but i couldnt find a solid pattern, I fixed the code and i tried to make the ready a bit more reliable by reverivying the existence of data in the buffer – Bruno Cotrim Oct 12 '20 at 02:24

1 Answers1

4

HTTP seems like a crazy simple protocol but it is not; you should use an HTTP client library such as the built-in java.net.http client.

The problem is that the concept of 'give me my data, then close it down' is HTTP/1.0, and that's a few decades out of date. HTTP/2.0 and HTTP/3.0 are binary protocols, and HTTP/1.1 tends to leave the connection open. In general, 'read lines', and even 'use Reader' (as in, read characters instead of bytes) is the wrong way to go about it, as HTTP is not a textual protocol. I know. It looks like one. It's not.

Here is a highly oversimplified overview of how e.g. a browser reads HTTP/1.1 responses:

  1. Use raw byte processing because HTTP body content is raw (or can be), therefore wrapping the whole thing into e.g. an InputStreamReader or BufferedReader is a non-starter.
  2. Keep reading until an 0x0A byte (in ASCII, the newline symbol), or X bytes have been read and your buffer for this is full, where X is not extraordinarily large. Wouldn't want a badly behaving server or a misunderstanding where you connect to a different (non-HTTP) service to cause a memory issue! Parse this first line as an HTTP/1.1 response.
  3. Keep doing this loop to pick up all headers. Use the same 'my buffer has limits' trick to avoid memory issues.
  4. Then check the response code in order to figure out if a body will be forthcoming. It's HTTP/1.1, so you can't just go: "Well, if the connection is closed, I guess no body is forthcoming". Whether one will be coming or not depends primarily on the response code.
  5. Assuming a body exists, read the double-newline that separates headers from the body.
  6. If the content is transfered as chunked encoding (common), start blitting data into a buffer, but check if you read the entire chunk. Reading chunked encoding is its own game, really.
  7. Alternatively, HTTP/1.1 DEMANDS that if chunked encoding isn't used that Content-Length is present. Use this header to know precisely how many bytes to read.
  8. Neither 'a newline' nor 'close connection' can ever serve as a meaningful marker of 'end of data' in HTTP/1.1, so, don't.
  9. Then either pass the content+headers+returncode verbatim to the requesting code, or dress it up a bit. For example, if the Content-Type header is present and has value text/html; encoding=UTF-8 you can consider taking the body data and turning it into a string via UTF-8 (new String(byteArray, StandardCharsets.UTF_8);).

Note that I've passed right over some bizarre behaviour that servers do because in ye olden days some dumb browser did weird things and it's now the status quo (for example, range requests are quite bizarre) and there's of course HTTP2 and HTTP3 which are completely different protocols.

Also, of course, HTTP servers are rare these days; HTTPS is where its at, and that's quite different too.

rzwitserloot
  • 85,357
  • 5
  • 51
  • 72
  • I have a simple question. How does browser decide the encoding of the header? or they all assume it's ASCⅡ. – haoyu wang Oct 12 '20 at 03:26
  • 1
    The spec more or less implies all bytes in the entire thing, at least until you get to the header, are ISO-8859-1, and in practice, ASCII. – rzwitserloot Oct 12 '20 at 03:37