Python: HTTP Post a large file with streaming

Question

I'm uploading potentially large files to a web server. Currently I'm doing this:

import urllib2

f = open('somelargefile.zip','rb')
request = urllib2.Request(url,f.read())
request.add_header("Content-Type", "application/zip")
response = urllib2.urlopen(request)

However, this reads the entire file's contents into memory before posting it. How can I have it stream the file to the server?

Related: [WSGI file streaming with a generator](http://stackoverflow.com/questions/11811404/) — Piotr Dobrogost, Oct 10 '12 at 22:20
Related: http://stackoverflow.com/questions/2502596/python-http-post-a-large-file-with-streaming — Christophe Roussy, Mar 17 '16 at 14:11

score 30 · Accepted Answer · answered Mar 23 '10 at 22:40

Reading through the mailing list thread linked to by systempuntoout, I found a clue towards the solution.

The mmap module allows you to open file that acts like a string. Parts of the file are loaded into memory on demand.

Here's the code I'm using now:

import urllib2
import mmap

# Open the file as a memory mapped string. Looks like a string, but 
# actually accesses the file behind the scenes. 
f = open('somelargefile.zip','rb')
mmapped_file_as_string = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)

# Do the request
request = urllib2.Request(url, mmapped_file_as_string)
request.add_header("Content-Type", "application/zip")
response = urllib2.urlopen(request)

#close everything
mmapped_file_as_string.close()
f.close()

could you please confirm the below line is correct: request = urllib2.Request(url, mmapped_file_as_string) — Ayyappan Anbalagan, Apr 29 '11 at 07:25

score 5 · Answer 2 · answered Jun 12 '15 at 19:16

The documentation doesn't say you can do this, but the code in urllib2 (and httplib) accepts any object with a read() method as data. So using an open file seems to do the trick.

You'll need to set the Content-Length header yourself. If it's not set, urllib2 will call len() on the data, which file objects don't support.

import os.path
import urllib2

data = open(filename, 'r')
headers = { 'Content-Length' : os.path.getsize(filename) }
response = urllib2.urlopen(url, data, headers)

This is the relevant code that handles the data you supply. It's from the HTTPConnection class in httplib.py in Python 2.7:

def send(self, data):
    """Send `data' to the server."""
    if self.sock is None:
        if self.auto_open:
            self.connect()
        else:
            raise NotConnected()

    if self.debuglevel > 0:
        print "send:", repr(data)
    blocksize = 8192
    if hasattr(data,'read') and not isinstance(data, array):
        if self.debuglevel > 0: print "sendIng a read()able"
        datablock = data.read(blocksize)
        while datablock:
            self.sock.sendall(datablock)
            datablock = data.read(blocksize)
    else:
        self.sock.sendall(data)

`urllib2.urlopen(url, data, headers)` doesn't take headers as parameter, so the line `response = urllib2.urlopen(url, data, headers)` won't work. I have provided working code in [answer](https://stackoverflow.com/a/51935108/9921853) below — Sergey Nudnov, Dec 28 '19 at 14:14
Is this possible with the requests module ? I have to send files in chunks (10 MB) however do not want to read all 10MB in memory but want to read some bytes (8192) and send to requests..till I complete 10MB — Simplecode, Jun 29 '21 at 06:21

systempuntoout · Answer 3 · 2010-03-24T09:09:53.857

2

Have you tried with Mechanize?

from mechanize import Browser
br = Browser()
br.open(url)
br.form.add_file(open('largefile.zip'), 'application/zip', 'largefile.zip')
br.submit()

or, if you don't want to use multipart/form-data, check this old post.

It suggests two options:

  1. Use mmap, Memory Mapped file object
  2. Patch httplib.HTTPConnection.send

edited Mar 24 '10 at 09:09

answered Mar 23 '10 at 18:40

systempuntoout

71,966
47
171
241

1

I'm not wanting to send the files encoded "multipart/form-data". This would seem to do that. I'm just looking for a raw post. – Daniel Von Fange Mar 23 '10 at 18:47
On python 2.7 option #2 has been added patched already, the block size is 8192, I wonder why.. hmmm. what's the norm/standard on this? – MistahX Jun 24 '11 at 00:00

score 1 · Answer 4 · answered Mar 23 '10 at 19:33

Try pycurl. I don't have anything setup will accept a large file that isn't in a multipart/form-data POST, but here's a simple example that reads the file as needed.

import os
import pycurl

class FileReader:
    def __init__(self, fp):
        self.fp = fp
    def read_callback(self, size):
        return self.fp.read(size)

c = pycurl.Curl()
c.setopt(pycurl.URL, url)
c.setopt(pycurl.UPLOAD, 1)
c.setopt(pycurl.READFUNCTION, FileReader(open(filename, 'rb')).read_callback)
filesize = os.path.getsize(filename)
c.setopt(pycurl.INFILESIZE, filesize)
c.perform()
c.close()

Thanks JimB. I'd have used this, except I have a few people Windows using this, and I don't want them to have to install anything else. — Daniel Von Fange, Mar 26 '10 at 13:32

score 1 · Answer 5 · answered Jun 29 '16 at 23:01

1

Using the requests library you can do

with open('massive-body', 'rb') as f:
    requests.post('http://some.url/streamed', data=f)

as mentioned here in their docs

answered Jun 29 '16 at 23:01

EthanP

1,663
3
22
27

The 8K block size still applies, as httplib.py, send() L#869 is called. – paul_h Oct 30 '16 at 23:13

Sergey Nudnov · Answer 6 · 2019-12-28T14:17:32.887

Below is the working example for both Python 2 / Python 3:

try:
    from urllib2 import urlopen, Request
except:
    from urllib.request import urlopen, Request

headers = { 'Content-length': str(os.path.getsize(filepath)) }
with open(filepath, 'rb') as f:
    req = Request(url, data=f, headers=headers)
    result = urlopen(req).read().decode()

The requests module is great, but sometimes you cannot install any extra modules...

Python: HTTP Post a large file with streaming

6 Answers6

Linked

Related