3

I have a client application which reads in the full body of a http response into a buffer and performs some processing on it:

body, _ = ioutil.ReadAll(containerObject.Resp.Body)

The problem is that this application runs on an embedded device, so responses that are too large fill up the device RAM, causing Ubuntu to kill the process.

To avoid this, I check the content-length header and bypass processing if the document is too large. However, some servers (I'm looking at you, Microsoft) send very large html responses without setting content-length and crash the device.

The only way I can see of getting around this is to read the response body up to a certain length. If it reaches this limit, then a new reader could be created which first streams the in-memory buffer, then continues reading from the original Resp.Body. Ideally, I would assign this new reader to the containerObject.Resp.Body so that callers would not know the difference.

I'm new to GoLang and am not sure how to go about coding this. Any suggestions or alternative solutions would be greatly appreciated.

Edit 1: The caller expects a Resp.Body object, so the solution needs to be compatible with that interface.

Edit 2: I cannot parse small chunks of the document. Either the entire document is processed or it is passed unchanged to the caller, without loading it into memory.

Steve Cohen
  • 53
  • 1
  • 7
  • Not setting `Content-Length` is pretty standard, as most large responses will use chunked encoding. – JimB Sep 14 '17 at 16:40
  • Why try to conditionally buffer part of the response, and not just use the `io.Reader` interface throughout? – JimB Sep 14 '17 at 16:43
  • To further @JimB recommendation I would start looking at this: https://stackoverflow.com/questions/31857891/how-to-read-a-file-character-by-character-in-go This might also be of use: https://tip.golang.org/pkg/encoding/json/#Decoder.Token – Cristian Cavalli Sep 14 '17 at 16:57
  • Once you read anything from Resp.Body, you have to buffer it in memory (if you want to do anything with it, that is) because you're reading from a network stream. This means that a large response gets read completely into memory and crashes the device. Now the other piece of this is that the caller of this function expects Resp.Body to be intact - exactly as if it were read directly from the http object. So I need to send it something that acts just like that object, but doesn't try to completely read large documents into RAM. – Steve Cohen Sep 14 '17 at 17:02
  • Thanks for the suggestion, Cristian. Unfortunately I cannot process small batches of the document at a time. It is an all-or-nothing affair. If I can't see the entire document, then I want to pass it on untouched. I will edit the question to reflect this. – Steve Cohen Sep 14 '17 at 17:04
  • @SteveCohen maybe something like: https://stackoverflow.com/questions/38874664/limiting-amount-of-data-read-in-the-response-to-a-http-get-request You'd have to make a decision once you read that data in whether or not it was valid json given by the bounds of your reader. – Cristian Cavalli Sep 14 '17 at 17:09
  • Wow, that is awesome. A LimitedReader will definitely solve part of the problem. Now I just need to figure out how to create a version of ReadCloser that first passes the contents obtained from the LimitedReader, then transparently switches over to the original Resp.Body (readCloser). – Steve Cohen Sep 14 '17 at 17:20

1 Answers1

3

If you need to read part of the response body, then reconstruct it in place for other callers, you can use a combination of an io.MultiReader and ioutil.NopCloser

resp, err := http.Get("http://google.com")
if err != nil {
    return err
}
defer resp.Body.Close()

part, err := ioutil.ReadAll(io.LimitReader(resp.Body, maxReadSize))
if err != nil {
    return err
}

// do something with part

// recombine the buffered part of the body with the rest of the stream
resp.Body = ioutil.NopCloser(io.MultiReader(bytes.NewReader(part), resp.Body))

// do something with the full Response.Body as an io.Reader

If you can't defer resp.Body.Close() because you intend to return the response before it's read in its entirety, you will need to augment the replacement body so that the Close() method applies to the original body. Rather than using the ioutil.NopCloser as the io.ReadCloser, create your own that refers to the correct method calls.

type readCloser struct {
    io.Closer
    io.Reader
}

resp.Body = readCloser{
    Closer: resp.Body,
    Reader: io.MultiReader(bytes.NewReader(part), resp.Body),
}
JimB
  • 104,193
  • 13
  • 262
  • 255
  • This looks to be exactly what I need. I will implement and get back to you shortly. – Steve Cohen Sep 14 '17 at 17:38
  • One question on this. If you replace resp.Body with a new object, the "defer resp.Body.Close()" will then close the new object while leaving the original body open, resulting in memory leaks. Do you know if this will have the same problem? Or will Close be called on the original resp.Body? – Steve Cohen Sep 14 '17 at 17:40
  • The `defer resp.Body.Close()` here is being evaluated on the original `resp.Body`; replacing it later doesn't affect it. The only thing to note is that you can't return the `resp` to whatever consumes the body, because the body will be closed at that point. I still don't understand what good it does to read the body first if the caller is going to read it again. You're not stopping the next caller from reading too much and running out of memory either. Why not just limit the response size in the first place? – JimB Sep 14 '17 at 17:46
  • Oddly, replacing the resp.Body.Close does not close the original resp.Body as expected. Not sure why but there is a post on this at the Golang forum, and I was able to confirm this with my own code (which no longer displayed the leak once the original resp.Body was closed prior to reassignment). – Steve Cohen Sep 14 '17 at 17:49
  • As for the second half of your comment, the device itself processes the http response and then passes it over the network to a client who writes it directly to disk. I am pretty certain that the main time this happens is when Microsoft transmits binary updates and tags them as html pages. – Steve Cohen Sep 14 '17 at 17:51
  • I can guarantee that the replacing of the body doesn't effect the closing of the original body. If it did at one point (which I can't see how) it was a bug. – JimB Sep 14 '17 at 17:59
  • I just looked over the code and see that I had mis-remembered something. I cannot use defer close on the resp.body because the caller consumes it. So I was replacing the original resp.body with a new one and never closed the old one. That was the actual source of my memory leak. – Steve Cohen Sep 14 '17 at 18:21
  • @SteveCohen: yes, so you need to make a simple `io.ReadCloser` to pass through the close to the body, rather than using `ioutil.NopCloser` – JimB Sep 14 '17 at 18:31
  • Actually, it looks like it's working exactly as you originally suggested. The caller is closing the new Resp.Body and the original Resp.Body is closing. At least, I cannot detect any memory leaks when loading very large pages. – Steve Cohen Sep 14 '17 at 18:38
  • @SteveCohen: not closing the body doesn't leak the body content, it prevents the http connection from being reused. You have to make sure the original Body is being closed. – JimB Sep 14 '17 at 18:40
  • You're right! I've been fixing memory leaks so much that I've been equating the two in my mind but they are not the same. Do you know how to monitor httpconnections so that I can determine if they are being closed? – Steve Cohen Sep 14 '17 at 19:23
  • @SteveCohen: There's not really a good way to monitor, because you really can't tell how many are supposed to be open. You could watch the number of goroutines, and see if that grows unbounded, but that won't catch slower leaks where the connections eventually time out and close. This is why it's important to structure the code in such a way so that you can defer Close in close proximity to the assignment, and easily verify that it will be called. – JimB Sep 14 '17 at 19:58
  • Ok, I see. I can't rearchitect the software but I'm quite certain that Resp.Body is being called properly by the caller for each http response. This rules out deferring the close, which means that you are correct in that I will need to implement a custom ReadCloser. – Steve Cohen Sep 14 '17 at 20:25