20

My goal is to do a PUT of part of a file using requests and stream the file (i.e., not load it into memory and then do the PUT).

This page explains how you would do that for an entire file:

Requests supports streaming uploads, which allow you to send large streams or files without reading them into memory. To stream and upload, simply provide a file-like object for your body:

with open('massive-body', 'rb') as f:
    requests.post('http://some.url/streamed', data=f)

However in my case I want to only send one chunk of the file. Is there a way to accomplish this?

In concept, something like:

with open('massive-body', 'rb') as f:
    requests.post('http://some.url/streamed', data=f.read(chunksize))
Greg
  • 45,306
  • 89
  • 231
  • 297
  • hmm, you could probably write a generator pretending to be a file-like object that will read a chunk behind the scenes, may be tricky though as I'm not sure what calls requests do on a file, but seems possible if none comes up with better solution – user3012759 Apr 21 '15 at 15:16
  • @user3012759: I tried searching for what Request requires in a file-like object, to no avail. However, note that it accepts a simple generator for [Chunk-Encoded Requests](http://docs.python-requests.org/en/latest/user/advanced/#chunk-encoded-requests) – PM 2Ring Apr 21 '15 at 15:25
  • I guess you could experiment using a basic file-like class that has `read` and `close` methods, and if that doesn't work keep adding methods to your class until Requests stops complaining. :) – PM 2Ring Apr 21 '15 at 15:27
  • @PM2Ring chunk-encoded should work as well imho, with the right headers set it should be rally easy to craft a generator to send part of a file in chunks – user3012759 Apr 21 '15 at 15:40
  • 1
    It isn't clear to me what you mean by "My goal is to do a PUT of part of a file using requests and stream the file". Do you also not want to read that *chunk* into memory? If that's what you want, I can help you come up with a solution. For your information, if reading the chunk into memory is okay, then your second snippet will work just fine. – Ian Stapleton Cordasco Apr 22 '15 at 16:13
  • 1
    @sigmavirus24, that's right I don't want to load all of 'f.read(chunksize)' into memory. – Greg Apr 22 '15 at 18:04
  • Would a version of Joe's code (modified as per my comment) that can be used to do chunk-encoded requests be acceptable? Or would you prefer to use a custom file-like class so that you can take advantage of Request's streaming support? – PM 2Ring Apr 23 '15 at 11:03

2 Answers2

11

Based off Greg's answers to my questions I think the following will work best:

First you'll need something to wrap your open file so that it limits how much data can be read:

class FileLimiter(object):
    def __init__(self, file_obj, read_limit):
        self.read_limit = read_limit
        self.amount_seen = 0
        self.file_obj = file_obj

        # So that requests doesn't try to chunk the upload but will instead stream it:
        self.len = read_limit

    def read(self, amount=-1):
        if self.amount_seen >= self.read_limit:
            return b''
        remaining_amount = self.read_limit - self.amount_seen
        data = self.file_obj.read(min(amount, remaining_amount))
        self.amount_seen += len(data)
        return data

This should roughly work as a good wrapper object. Then you would use it like so:

 with open('my_large_file', 'rb') as file_obj:
     file_obj.seek(my_offset)
     upload = FileLimiter(file_obj, my_chunk_limit)
     r = requests.post(url, data=upload, headers={'Content-Type': 'application/octet-stream'})

The headers are obviously optional, but when streaming data to a server, it's a good idea to be a considerate user and tell the server what the type of the content is that you're sending.

Ian Stapleton Cordasco
  • 26,944
  • 4
  • 67
  • 72
  • 1
    thanks, this is just what i needed! one minor bug fix to `read` for when `amount` is -1: `data = self.file_obj.read(remaining_amount if amount < 0 else min(amount, remaining_amount))` – ryan Jan 22 '16 at 05:52
  • How can we send the filename to the server with this method? – Masoud Rahimi Nov 05 '18 at 10:02
  • What is the "chucksize" itself is big and this is in multithreading. I think calling read() will cause memoryIssue ? – Simplecode Jul 09 '21 at 12:50
5

I'm just throwing 2 other answers together so bear with me if it doesn't work out of the box—I have no means of testing this:

Lazy Method for Reading Big File in Python?

http://docs.python-requests.org/en/latest/user/advanced/#chunk-encoded-requests

def read_in_chunks(file_object, blocksize=1024, chunks=-1):
    """Lazy function (generator) to read a file piece by piece.
    Default chunk size: 1k."""
    while chunks:
        data = file_object.read(blocksize)
        if not data:
            break
        yield data
        chunks -= 1


requests.post('http://some.url/chunked', data=read_in_chunks(f))
Henry Ecker
  • 34,399
  • 18
  • 41
  • 57
Joe
  • 2,496
  • 1
  • 22
  • 30
  • Note that Greg does **not** want to upload the whole file. Maybe you should change your `chunk_size` to something else, eg `blocksize`, since Greg is using `chunksize` to mean the total size of the data to be transferred. FWIW, your code can be easily modified to break out of the loop when `chunksize` bytes have been sent, the only trick is that the last block may be short if `chunksize % blocksize` isn't zero. – PM 2Ring Apr 23 '15 at 10:59
  • I updated the example so you can specify a maximum number of chunks – Joe Apr 23 '15 at 11:05
  • Ok. That works. The caller has to ensure that `blocksize` is a divisor of `chunksize`, but hopefully that's not a major issue. – PM 2Ring Apr 23 '15 at 11:10
  • This looks promising! So if I also wanted to start at a certain offset in the large file, I could just throw a seek in there before the while statement? – Greg Apr 23 '15 at 12:23
  • So if I'm ok with the last block being short, I don't have to worry about ensuring that blocksize is a divisor of chunksize? – Greg Apr 23 '15 at 12:32
  • file_object.read() states that it will read "at most" blocksize, and will not complain if there isn't enough bytes, so you should be in the clear: https://docs.python.org/2.4/lib/bltin-file-objects.html – Joe Apr 23 '15 at 16:02
  • It did not work for me, I got a strange exception from requests ('HTTPSConnectionPool' object has no attribute 'getresponse'). Since the FileLimiter solution works for me I did not investigate any further. – Michael Dussere Dec 15 '17 at 13:03