python requests - response appears to be truncated

Question

I am downloading pdfs using python requests library by doing:

from tempfile import NamedTemporaryFile
f = NamedTemporaryFile()

response = requests.get(pdf_url)
assert response.status_code == 200 # optionally `assert response.ok`
f.write(response.content)

Every so often response.content appears to be truncated: when I do f.tell(), I see there there are less bytes than expected. The Pdf also is broken: it does not open in a pdf reader.

When I then immediately redo the same request with the same url then the full file is downloaded, and f.tell() shows the expected value, and the pdf opens in a pdf reader.

Is this a commonly known problem?

Note: I seem to have a memory leak - but this problem is happening when I am using 700MB and have 1300MB left.

Did you open `f` in *binary* mode? You are missing a mode parameter here altogether (no `'w'` even). — Martijn Pieters, Jun 09 '14 at 07:32
sorry Martjin, I realised Im using temfile library. Updated. — Rich Tier, Jun 09 '14 at 07:35
And how are you determining that you have less data than you are getting fewer bytes than expected? — Martijn Pieters, Jun 09 '14 at 07:36
when I see the downloaded file cannot be opened in pdf reader I download the file again. I compare the filesize of both. See the new one is longer, and it opens in pdf reader. — Rich Tier, Jun 09 '14 at 07:44
Compare the `response.headers['Content-length']` result with the file size. Most likely *it is the server* that is sending you incomplete data. In any case, for larger (binary data) responses, it'll be more efficient to use streaming. See [How to download image using requests](http://stackoverflow.com/a/13137873) — Martijn Pieters, Jun 09 '14 at 08:01

python requests - response appears to be truncated

0 Answers0