20

I am uploading a large file using the Python requests package, and I can't find any way to give data back about the progress of the upload. I have seen a number of progress meters for downloading a file, but these will not work for a file upload.

The ideal solution would be some sort of callback method such as:

def progress(percent):
  print percent
r = requests.post(URL, files={'f':hugeFileHandle}, callback=progress)

Thanks in advance for your help :)

Martijn Pieters
  • 1,048,767
  • 296
  • 4,058
  • 3,343
Robin Begbie
  • 363
  • 1
  • 3
  • 6
  • 3
    You'd have to implement the progress in `hugeFileHandle`. I'm not sure why requests doesn't provide a clean way of doing this. – Blender Dec 17 '12 at 07:22

9 Answers9

19

requests doesn't support upload streaming e.g.:

import os
import sys
import requests  # pip install requests

class upload_in_chunks(object):
    def __init__(self, filename, chunksize=1 << 13):
        self.filename = filename
        self.chunksize = chunksize
        self.totalsize = os.path.getsize(filename)
        self.readsofar = 0

    def __iter__(self):
        with open(self.filename, 'rb') as file:
            while True:
                data = file.read(self.chunksize)
                if not data:
                    sys.stderr.write("\n")
                    break
                self.readsofar += len(data)
                percent = self.readsofar * 1e2 / self.totalsize
                sys.stderr.write("\r{percent:3.0f}%".format(percent=percent))
                yield data

    def __len__(self):
        return self.totalsize

# XXX fails
r = requests.post("http://httpbin.org/post",
                  data=upload_in_chunks(__file__, chunksize=10))

btw, if you don't need to report progress; you could use memory-mapped file to upload large file.

To workaround it, you could create a file adaptor similar to the one from urllib2 POST progress monitoring:

class IterableToFileAdapter(object):
    def __init__(self, iterable):
        self.iterator = iter(iterable)
        self.length = len(iterable)

    def read(self, size=-1): # TBD: add buffer for `len(data) > size` case
        return next(self.iterator, b'')

    def __len__(self):
        return self.length

Example

it = upload_in_chunks(__file__, 10)
r = requests.post("http://httpbin.org/post", data=IterableToFileAdapter(it))

# pretty print
import json
json.dump(r.json, sys.stdout, indent=4, ensure_ascii=False)
Community
  • 1
  • 1
jfs
  • 399,953
  • 195
  • 994
  • 1,670
  • This worked for the most part, but I found that this just uploaded the contents of the file. Really, what I need is to use requests.post(url, files={'file',fileobj}), and doing this only gives the first chunk of the file using your method – Robin Begbie Dec 18 '12 at 01:44
  • 1
    @Robin: the above is a hack that can easily fail. You could try `poster` instead. It supports progress callbacks and streaming (with known content-length) of multipart/form-data. btw, remove the tick if the answer is not acceptable for your question. – jfs Dec 18 '12 at 18:29
  • @qarma: If you know a better answer; post it. – jfs Oct 29 '14 at 08:16
  • 2
    kennethreitz commented on 10 Jan 2013: Done. https://github.com/kennethreitz/requests/issues/952 – Dima Tisnek Oct 29 '14 at 13:51
  • 1
    @qarma: I believe you. Could you write a minimal example (with the progress reporting required by OP), test it with a file that is larger than available memory, make sure that the behavior is reasonable: no swapping, the progress report is in real time. I can't delete the accepted answer. I can only provide a link to a better answer. – jfs Oct 29 '14 at 14:47
  • How is `__len__` in class `upload_in_chunks(object)` automatically called when `requests.post()` is executed? Is the method `__len__` overriding a method in the requests library, I could not find anything in there. If the method is removed, then requests does not upload the file for me. – probat May 08 '20 at 23:19
  • @probat: In general, the builtin `len()` function calls the corresponding `__len__` method. I don't know whether the answer is applicable to the current requests version. – jfs May 09 '20 at 14:42
  • @jfs, sorry I should have been more specific. Using the current version of `requests` library, one can pass a generator object directly to the post data parameter. I presume at the time this question was originally asked, passing a generator object directly was not possible. Returning to present again, the wrapper class `IterableToFileAdapter' is not needed anymore and you can directly pass the class `upload_in_chunks` to the data parameter. I see though removing the method `def __len__(self):` from class `upload_in_chunks` causes an empty file uploaded using requests. – probat May 10 '20 at 23:45
  • @jfs, continuation of previous comment. If you leave the method that contains the `return self.totalsize` then the file is correctly uploaded using requests. I cannot figure out why this would be though... The [requests documentation](https://requests.readthedocs.io/en/master/user/advanced/#chunk-encoded-requests) says no length is necessary. It seems as if though behind the scenes in the requests source code it is calling `len()` on the class object `upload_in_chunks`, but I could not find anything in the source code. – probat May 10 '20 at 23:51
  • @jfs i need send the data to files parameter, not data parameter. When I replace data with file, I'm getting this error. TypeError: a bytes-like object is required, not 'upload_in_chunks' – Naren Babu R May 17 '21 at 14:42
14

I recommend to use a tool package named requests-toolbelt, which make monitoring upload bytes very easy, like

from requests_toolbelt import MultipartEncoder, MultipartEncoderMonitor
import requests

def my_callback(monitor):
    # Your callback function
    print monitor.bytes_read

e = MultipartEncoder(
    fields={'field0': 'value', 'field1': 'value',
            'field2': ('filename', open('file.py', 'rb'), 'text/plain')}
    )
m = MultipartEncoderMonitor(e, my_callback)

r = requests.post('http://httpbin.org/post', data=m,
                  headers={'Content-Type': m.content_type})

And you may want to read this to show a progress bar.

wuliang8910
  • 271
  • 3
  • 4
  • This would basically be what I need... BUT... is there a way to upload the contents of f.e. `file.py` in chunks now? – Georg Oct 13 '15 at 12:58
  • @Georg, according to the documentation for requests-toolbelt, this should innately support streaming. – swampfox357 May 24 '19 at 16:25
10

I got it working with the code from here: Simple file upload progressbar in PyQt. I changed it a bit, to use BytesIO instead of StringIO.

class CancelledError(Exception):
    def __init__(self, msg):
        self.msg = msg
        Exception.__init__(self, msg)

    def __str__(self):
        return self.msg

    __repr__ = __str__

class BufferReader(BytesIO):
    def __init__(self, buf=b'',
                 callback=None,
                 cb_args=(),
                 cb_kwargs={}):
        self._callback = callback
        self._cb_args = cb_args
        self._cb_kwargs = cb_kwargs
        self._progress = 0
        self._len = len(buf)
        BytesIO.__init__(self, buf)

    def __len__(self):
        return self._len

    def read(self, n=-1):
        chunk = BytesIO.read(self, n)
        self._progress += int(len(chunk))
        self._cb_kwargs.update({
            'size'    : self._len,
            'progress': self._progress
        })
        if self._callback:
            try:
                self._callback(*self._cb_args, **self._cb_kwargs)
            except: # catches exception from the callback
                raise CancelledError('The upload was cancelled.')
        return chunk


def progress(size=None, progress=None):
    print("{0} / {1}".format(size, progress))


files = {"upfile": ("file.bin", open("file.bin", 'rb').read())}

(data, ctype) = requests.packages.urllib3.filepost.encode_multipart_formdata(files)

headers = {
    "Content-Type": ctype
}

body = BufferReader(data, progress)
requests.post(url, data=body, headers=headers)

The trick is, to generate data and header from the files list manually, using encode_multipart_formdata() from urllib3

derhoch
  • 458
  • 6
  • 9
7

I know this is an old question, but I couldn't find an easy answer anywhere else, so hopefully this will help somebody else:

import requests
import tqdm    
with open(file_name, 'rb') as f:
        r = requests.post(url, data=tqdm(f.readlines()))
user1432738
  • 101
  • 1
  • 5
  • This would not work well, readlines might not read a file completely when open in `rb` mode and might only upload the file partially – chiragjn Jan 19 '23 at 12:19
  • Yes, I think you are right. This solution seems to work sometimes, but not consistently. I now use something similar to Glen Thompson's answer. @chiragjn – user1432738 Feb 22 '23 at 05:48
4

This solution uses requests_toolbelt and tqdm both well maintained and popular libraries.

from pathlib import Path
from tqdm import tqdm

import requests
from requests_toolbelt import MultipartEncoder, MultipartEncoderMonitor

def upload_file(upload_url, fields, filepath):

    path = Path(filepath)
    total_size = path.stat().st_size
    filename = path.name

    with tqdm(
        desc=filename,
        total=total_size,
        unit="B",
        unit_scale=True,
        unit_divisor=1024,
    ) as bar:
        with open(filepath, "rb") as f:
            fields["file"] = ("filename", f)
            e = MultipartEncoder(fields=fields)
            m = MultipartEncoderMonitor(
                e, lambda monitor: bar.update(monitor.bytes_read - bar.n)
            )
            headers = {"Content-Type": m.content_type}
            requests.post(upload_url, data=m, headers=headers)

Example usage

upload_url = 'https://uploadurl'
fields = {
  "field1": value1, 
  "field2": value2
}
filepath = '97a6fce8_owners_2018_Van Zandt.csv'

upload_file(upload_url, fields, filepath)

Demo

Glen Thompson
  • 9,071
  • 4
  • 54
  • 50
2

Usually you would build a streaming datasource (a generator) that reads the file chunked and reports its progress on the way (see kennethreitz/requests#663. This does not work with requests file-api, because requests doesn’t support streaming uploads (see kennethreitz/requests#295) – a file to upload needs to be complete in memory before it starts getting processed.

but requests can stream content from a generator as J.F. Sebastian has proven before, but this generator needs to generate the complete datastream including the multipart encoding and boundaries. This is where poster comes to play.

poster is originally written to be used with pythons urllib2 and supports streaming generation of multipart requests, providing progress indication as it goes along. Posters Homepage provides examples of using it together with urllib2 but you really don’t want to use urllib2. Check out this example-code on how to to HTTP Basic Authentication with urllib2. Horrrrrrrrible.

So we really want to use poster together with requests to do file uploads with tracked progress. And here is how:

# load requests-module, a streamlined http-client lib
import requests

# load posters encode-function
from poster.encode import multipart_encode



# an adapter which makes the multipart-generator issued by poster accessable to requests
# based upon code from http://stackoverflow.com/a/13911048/1659732
class IterableToFileAdapter(object):
    def __init__(self, iterable):
        self.iterator = iter(iterable)
        self.length = iterable.total

    def read(self, size=-1):
        return next(self.iterator, b'')

    def __len__(self):
        return self.length

# define a helper function simulating the interface of posters multipart_encode()-function
# but wrapping its generator with the file-like adapter
def multipart_encode_for_requests(params, boundary=None, cb=None):
    datagen, headers = multipart_encode(params, boundary, cb)
    return IterableToFileAdapter(datagen), headers



# this is your progress callback
def progress(param, current, total):
    if not param:
        return

    # check out http://tcd.netinf.eu/doc/classnilib_1_1encode_1_1MultipartParam.html
    # for a complete list of the properties param provides to you
    print "{0} ({1}) - {2:d}/{3:d} - {4:.2f}%".format(param.name, param.filename, current, total, float(current)/float(total)*100)

# generate headers and gata-generator an a requests-compatible format
# and provide our progress-callback
datagen, headers = multipart_encode_for_requests({
    "input_file": open('recordings/really-large.mp4', "rb"),
    "another_input_file": open('recordings/even-larger.mp4', "rb"),

    "field": "value",
    "another_field": "another_value",
}, cb=progress)

# use the requests-lib to issue a post-request with out data attached
r = requests.post(
    'https://httpbin.org/post',
    auth=('user', 'password'),
    data=datagen,
    headers=headers
)

# show response-code and -body
print r, r.text
1

My upload server doesn't support Chunk-Encoded so I came up with this solution. It basically just a wrapper around python IOBase and allow tqdm.wrapattr to work seamless.

import io
import requests
from typing import Union
from tqdm import tqdm
from tqdm.utils import CallbackIOWrapper

class UploadChunksIterator(Iterable):
    """
    This is an interface between python requests and tqdm.
    Make tqdm to be accessed just like IOBase for requests lib.
    """

    def __init__(
        self, file: Union[io.BufferedReader, CallbackIOWrapper], total_size: int, chunk_size: int = 16 * 1024
    ):  # 16MiB
        self.file = file
        self.chunk_size = chunk_size
        self.total_size = total_size

    def __iter__(self):
        return self

    def __next__(self):
        data = self.file.read(self.chunk_size)
        if not data:
            raise StopIteration
        return data

    # we dont retrive len from io.BufferedReader because CallbackIOWrapper only has read() method.
    def __len__(self):
        return self.total_size

fp = "data/mydata.mp4"
s3url = "example.com"
_quiet = False

with open(fp, "rb") as f:
    total_size = os.fstat(f.fileno()).st_size
    if not _quiet:
        f = tqdm.wrapattr(f, "read", desc=hv, miniters=1, total=total_size, ascii=True)

    with f as f_iter:
        res = requests.put(
            url=s3url,
            data=UploadChunksIterator(f_iter, total_size=total_size),
        )
    res.raise_for_status()
王予智
  • 11
  • 1
  • 2
1

Making @jfs' answer better in terms of an informative progress bar.

import math
import os
import requests
import sys


class ProgressUpload:
    def __init__(self, filename, chunk_size=1250):
        self.filename = filename
        self.chunk_size = chunk_size
        self.file_size = os.path.getsize(filename)
        self.size_read = 0
        self.divisor = min(math.floor(math.log(self.file_size, 1000)) * 3, 9)  # cap unit at a GB
        self.unit = {0: 'B', 3: 'KB', 6: 'MB', 9: 'GB'}[self.divisor]
        self.divisor = 10 ** self.divisor


    def __iter__(self):
        progress_str = f'0 / {self.file_size / self.divisor:.2f} {self.unit} (0 %)'
        sys.stderr.write(f'\rUploading {dist_file}: {progress_str}')
        with open(self.filename, 'rb') as f:
            for chunk in iter(lambda: f.read(self.chunk_size), b''):
                self.size_read += len(chunk)
                yield chunk
                sys.stderr.write('\b' * len(progress_str))
                percentage = self.size_read / self.file_size * 100
                completed_str = f'{self.size_read / self.divisor:.2f}'
                to_complete_str = f'{self.file_size / self.divisor:.2f} {self.unit}'
                progress_str = f'{completed_str} / {to_complete_str} ({percentage:.2f} %)'
                sys.stderr.write(progress_str)
        sys.stderr.write('\n')

    def __len__(self):
        return self.file_size


# sample usage
requests.post(upload_url, data=ProgressUpload('file_path'))

The key is the __len__ method. Without it, I was getting connection closed errors. That's the only reason you can't just use tqdm + iter to get a simple progress bar.

Elijah
  • 1,814
  • 21
  • 27
0

My python code that works great. Credit : twine

import sys
import tqdm
import requests
import requests_toolbelt

class ProgressBar(tqdm.tqdm):
    def update_to(self, n: int) -> None:
        self.update(n - self.n)

with open("test.zip", "rb") as fp:

    data_to_send = []
    session = requests.session()

    data_to_send.append(
        ("files", ("test.zip", fp))
    )

    encoder = requests_toolbelt.MultipartEncoder(data_to_send)
    with ProgressBar(
        total=encoder.len,
        unit="B",
        unit_scale=True,
        unit_divisor=1024,
        miniters=1,
        file=sys.stdout,
    ) as bar:
        monitor = requests_toolbelt.MultipartEncoderMonitor(
            encoder, lambda monitor: bar.update_to(monitor.bytes_read)
        )

        r = session.post(
            'http://httpbin.org/post',
            data=monitor,
            headers={"Content-Type": monitor.content_type},
        )

print(r.text)
jak bin
  • 380
  • 4
  • 8