18

I'm uploading potentially large files to a web server. Currently I'm doing this:

import urllib2

f = open('somelargefile.zip','rb')
request = urllib2.Request(url,f.read())
request.add_header("Content-Type", "application/zip")
response = urllib2.urlopen(request)

However, this reads the entire file's contents into memory before posting it. How can I have it stream the file to the server?

Daniel Von Fange
  • 5,973
  • 3
  • 26
  • 23

6 Answers6

30

Reading through the mailing list thread linked to by systempuntoout, I found a clue towards the solution.

The mmap module allows you to open file that acts like a string. Parts of the file are loaded into memory on demand.

Here's the code I'm using now:

import urllib2
import mmap

# Open the file as a memory mapped string. Looks like a string, but 
# actually accesses the file behind the scenes. 
f = open('somelargefile.zip','rb')
mmapped_file_as_string = mmap.mmap(f.fileno(), 0, access=mmap.ACCESS_READ)

# Do the request
request = urllib2.Request(url, mmapped_file_as_string)
request.add_header("Content-Type", "application/zip")
response = urllib2.urlopen(request)

#close everything
mmapped_file_as_string.close()
f.close()
Daniel Von Fange
  • 5,973
  • 3
  • 26
  • 23
5

The documentation doesn't say you can do this, but the code in urllib2 (and httplib) accepts any object with a read() method as data. So using an open file seems to do the trick.

You'll need to set the Content-Length header yourself. If it's not set, urllib2 will call len() on the data, which file objects don't support.

import os.path
import urllib2

data = open(filename, 'r')
headers = { 'Content-Length' : os.path.getsize(filename) }
response = urllib2.urlopen(url, data, headers)

This is the relevant code that handles the data you supply. It's from the HTTPConnection class in httplib.py in Python 2.7:

def send(self, data):
    """Send `data' to the server."""
    if self.sock is None:
        if self.auto_open:
            self.connect()
        else:
            raise NotConnected()

    if self.debuglevel > 0:
        print "send:", repr(data)
    blocksize = 8192
    if hasattr(data,'read') and not isinstance(data, array):
        if self.debuglevel > 0: print "sendIng a read()able"
        datablock = data.read(blocksize)
        while datablock:
            self.sock.sendall(datablock)
            datablock = data.read(blocksize)
    else:
        self.sock.sendall(data)
Brian Beach
  • 161
  • 2
  • 3
  • `urllib2.urlopen(url, data, headers)` doesn't take headers as parameter, so the line `response = urllib2.urlopen(url, data, headers)` won't work. I have provided working code in [answer](https://stackoverflow.com/a/51935108/9921853) below – Sergey Nudnov Dec 28 '19 at 14:14
  • Is this possible with the requests module ? I have to send files in chunks (10 MB) however do not want to read all 10MB in memory but want to read some bytes (8192) and send to requests..till I complete 10MB – Simplecode Jun 29 '21 at 06:21
2

Have you tried with Mechanize?

from mechanize import Browser
br = Browser()
br.open(url)
br.form.add_file(open('largefile.zip'), 'application/zip', 'largefile.zip')
br.submit()

or, if you don't want to use multipart/form-data, check this old post.

It suggests two options:

  1. Use mmap, Memory Mapped file object
  2. Patch httplib.HTTPConnection.send
systempuntoout
  • 71,966
  • 47
  • 171
  • 241
  • 1
    I'm not wanting to send the files encoded "multipart/form-data". This would seem to do that. I'm just looking for a raw post. – Daniel Von Fange Mar 23 '10 at 18:47
  • On python 2.7 option #2 has been added patched already, the block size is 8192, I wonder why.. hmmm. what's the norm/standard on this? – MistahX Jun 24 '11 at 00:00
1

Try pycurl. I don't have anything setup will accept a large file that isn't in a multipart/form-data POST, but here's a simple example that reads the file as needed.

import os
import pycurl

class FileReader:
    def __init__(self, fp):
        self.fp = fp
    def read_callback(self, size):
        return self.fp.read(size)

c = pycurl.Curl()
c.setopt(pycurl.URL, url)
c.setopt(pycurl.UPLOAD, 1)
c.setopt(pycurl.READFUNCTION, FileReader(open(filename, 'rb')).read_callback)
filesize = os.path.getsize(filename)
c.setopt(pycurl.INFILESIZE, filesize)
c.perform()
c.close()
JimB
  • 104,193
  • 13
  • 262
  • 255
  • 1
    Thanks JimB. I'd have used this, except I have a few people Windows using this, and I don't want them to have to install anything else. – Daniel Von Fange Mar 26 '10 at 13:32
1

Using the requests library you can do

with open('massive-body', 'rb') as f:
    requests.post('http://some.url/streamed', data=f)

as mentioned here in their docs

EthanP
  • 1,663
  • 3
  • 22
  • 27
0

Below is the working example for both Python 2 / Python 3:

try:
    from urllib2 import urlopen, Request
except:
    from urllib.request import urlopen, Request

headers = { 'Content-length': str(os.path.getsize(filepath)) }
with open(filepath, 'rb') as f:
    req = Request(url, data=f, headers=headers)
    result = urlopen(req).read().decode()

The requests module is great, but sometimes you cannot install any extra modules...

Sergey Nudnov
  • 1,327
  • 11
  • 20