0

I'm uploading a large file (about 2GB) to an API that accepts POST method using requests module of Python, which results in loading the file to the memory first and increasing memory usage significantly. I believe there will be some other ways to stream the file to the API without burdening the memory. Any suggestions?

P.S.
This old way worked for me, but consumed too much memory.

file = {'file': open(path, 'rb')}
requests.post(url, files = file)

Below streaming way sees no memory gorged but returns code 400 from the server.

requests.post(url,data=open(path, 'rb'))
Baytars
  • 103
  • 1
  • 8

2 Answers2

1

Any suggestions?

Use Streaming Upload, as docs put it:

Requests supports streaming uploads, which allow you to send large streams or files without reading them into memory. To stream and upload, simply provide a file-like object for your body:

with open('massive-body', 'rb') as f:
    requests.post('http://some.url/streamed', data=f)
Daweo
  • 31,313
  • 3
  • 12
  • 25
  • Now I know where the memory issue lies: I put the file object in a dict: `file = {'file': open(path, 'rb')}` and then posted: `requests.post(url,files = file)`. If putting the file object directly into the post data as you wrote, I run into no issue. Thank you! – Baytars Jul 08 '22 at 12:47
  • I'm sorry but if I don't upload the file in the `file = {'file': open(path, 'rb')}` way, the server will respond with code 400. I have updated my question to reflect this feedback. – Baytars Jul 08 '22 at 13:20
0

When you pass files arg then requests lib makes a multipart form upload. i.e. it is like submitting a form, where the file is passed as a named field (file in your example)

I suspect the problem you saw is because when you pass a file object as data arg, as suggested in the docs here https://requests.readthedocs.io/en/latest/user/advanced/#streaming-uploads then it does a streaming upload but the file content is used as the whole http post body.

So I think the server at the other end is expecting a form with a file field, but we're just sending the binary content of the file by itself.

What we need is some way to wrap the content of the file with the right "envelope" as we send it to the server, so that it can recognise the data we are sending.

See this issue where others have noted the same problem: https://github.com/psf/requests/issues/1584

I think the best suggestion from there is to use this additional lib, which provides streaming multipart form file upload: https://github.com/requests/toolbelt#multipartform-data-encoder

For example:

from requests_toolbelt import MultipartEncoder
import requests

encoder = MultipartEncoder(
    fields={'file': ('myfilename.xyz', open(path, 'rb'), 'text/plain')}
)
response = requests.post(
    url, data=encoder, headers={'Content-Type': encoder.content_type}
)
Anentropic
  • 32,188
  • 12
  • 99
  • 147
  • 1
    Yes, at length I found the same lib as yours and tested it out. It worked like a charm! This seemingly simple question led me to go a long and complex journey. Thank you! – Baytars Jul 08 '22 at 14:49