0

I have a 300 mb file that I need to upload, and my current code just isn't cutting it.

#----------------------------------------------------------------------------------
    def _post_multipart(self, host, selector,
                        fields, files,
                        ssl=False,port=80,
                        proxy_url=None,proxy_port=None):
        """ performs a multi-post to AGOL, Portal, or AGS
            Inputs:
               host - string - root url (no http:// or https://)
                   ex: www.arcgis.com
               selector - string - everything after the host
                   ex: /PWJUSsdoJDp7SgLj/arcgis/rest/services/GridIndexFeatures/FeatureServer/0/1/addAttachment
               fields - dictionary - additional parameters like token and format information
               files - tuple array- tuple with the file name type, filename, full path
               ssl - option to use SSL
               proxy_url - string - url to proxy server
               proxy_port - interger - port value if not on port 80

            Output:
               JSON response as dictionary
            Useage:
               import urlparse
               url = "http://sampleserver3.arcgisonline.com/ArcGIS/rest/services/SanFrancisco/311Incidents/FeatureServer/0/10261291"
               parsed_url = urlparse.urlparse(url)
               params = {"f":"json"}
               print _post_multipart(host=parsed_url.hostname,
                               selector=parsed_url.path,
                               files=files,
                               fields=params
                               )
        """
        content_type, body = self._encode_multipart_formdata(fields, files)

        headers = {
            'content-type': content_type,
            'content-length': str(len(body))
        }

        if proxy_url:
            if ssl:
                h = httplib.HTTPSConnection(proxy_url, proxy_port)

                h.request('POST', 'https://' + host + selector, body, headers)

            else:
                h = httplib.HTTPConnection(proxy_url, proxy_port)
                h.request('POST', 'http://' + host + selector, body, headers)
        else:
            if ssl:
                h = httplib.HTTPSConnection(host,port)
                h.request('POST', selector, body, headers)
            else:
                h = httplib.HTTPConnection(host,port)
                h.request('POST', selector, body, headers)

        resp_data = h.getresponse().read()
        try:
            result = json.loads(resp_data)
        except:
            return None

        if 'error' in result:
            if result['error']['message'] == 'Request not made over ssl':
                return self._post_multipart(host=host, selector=selector, fields=fields,
                                            files=files, ssl=True,port=port,
                                            proxy_url=proxy_url,proxy_port=proxy_port)
        return result

def _encode_multipart_formdata(self, fields, files):
        boundary = mimetools.choose_boundary()
        buf = StringIO()
        for (key, value) in fields.iteritems():
            buf.write('--%s\r\n' % boundary)
            buf.write('Content-Disposition: form-data; name="%s"' % key)
            buf.write('\r\n\r\n' + self._tostr(value) + '\r\n')
        for (key, filepath, filename) in files:
            if os.path.isfile(filepath):
                buf.write('--%s\r\n' % boundary)
                buf.write('Content-Disposition: form-data; name="%s"; filename="%s"\r\n' % (key, filename))
                buf.write('Content-Type: %s\r\n' % (self._get_content_type3(filename)))
                file = open(filepath, "rb")
                try:
                    buf.write('\r\n' + file.read() + '\r\n')
                finally:
                    file.close()
        buf.write('--' + boundary + '--\r\n\r\n')
        buf = buf.getvalue()
        content_type = 'multipart/form-data; boundary=%s' % boundary
        return content_type, buf

I cannot use requests module, and must use the standard libraries like urllib2, urllib, etc.. for python 2.7.x.

Is there a way to load the 300 mb files to a site without pushing the whole thing to memory?

UPDATE:

So I switched to requests, and now I get: MissingSchema: Invalid URL u'www.arcgis.com/sharing/rest/content/users//addItem?': No schema supplied. Perhaps you meant http://www.arcgis.com/sharing/rest/content/users//addItem??

What does this mean?

I provide the fields with the request.post() as such:

    #----------------------------------------------------------------------------------
def _post_big_files(self, host, selector,
                    fields, files,
                    ssl=False,port=80,
                    proxy_url=None,proxy_port=None):
    import sys
    sys.path.insert(1,os.path.dirname(__file__))
    from requests_toolbelt import MultipartEncoder
    import requests
    if proxy_url is not None:
        proxyDict = {
              "http"  : "%s:%s" % (proxy_url, proxy_port),
            "https" : "%s:%s" % (proxy_url, proxy_port)
            }
    else:
        proxyDict = {}
    for k,v in fields.iteritems():
        print k,v
        fields[k] = json.dumps(v)
    for key, filepath, filename in files:
        fields[key] = ('filename', open(filepath, 'rb'), self._get_content_type3(filepath))
    m = MultipartEncoder(
    fields=fields)
    print host + selector
    r = requests.post(host + selector , data=m,
                      headers={'Content-Type': m.content_type})
    print r

I followed the example in the help documentation from both request and the toolbelt. Any ideas why this is breaking?

Thank you,

code base 5000
  • 3,812
  • 13
  • 44
  • 73
  • Study how the [`requests-toolbelt` add-on](http://toolbelt.readthedocs.org/en/latest/user.html#uploading-data) does this with `requests` and replicate that? But why the arbitrary limitation on stdlib libraries only? – Martijn Pieters Nov 05 '14 at 13:19
  • In any case, sending large data *as a stream* with the standard library already requires that you use a [`mmap` hack](http://stackoverflow.com/questions/2502596/python-http-post-a-large-file-with-streaming), so doing this as a multipart post is only going to be harder. – Martijn Pieters Nov 05 '14 at 13:21
  • @MartijnPieters - Thank you for the suggestion, based on the code in the stack post, you provided, would mmap be used in the _encode_multipart_formdata()? – code base 5000 Nov 05 '14 at 13:30
  • You'll need a file containing the *whole POST body*, so including the boundaries. This is not really a viable option in my opinion, but the `urllib2` API only takes a string POST body otherwise. – Martijn Pieters Nov 05 '14 at 13:31
  • I'd install `requests` and `requests-toolbelt` in a virtualenv and use that instead. – Martijn Pieters Nov 05 '14 at 13:32
  • @MartijnPieters - Like I said, I can't use requests, when I pass the mmap_string object into the buf object, it appears to be uploading a 1 KB file instead of a 300 mb file. Any suggestions? – code base 5000 Nov 05 '14 at 13:44
  • **Why** can't you use `requests`? You don't need to have admin privileges to use it. – Martijn Pieters Nov 05 '14 at 13:45
  • Not approved software. – code base 5000 Nov 05 '14 at 13:49
  • To be honest, I'd go through the process of having it approved. It'll be *much less painful*. You'll either have to drive the `HTTPConnection` directly, or write out a complete POST body to a file first, or subclass and override about half of `urllib2` internals to get this to work in any sane kind of manner. This is not going to be easy. – Martijn Pieters Nov 05 '14 at 13:50
  • I fear this is going to be a *lot of work*. Save your company the engineering time and invest in the approval process instead. The library is licensed under the Apache2 license, one of the best OSS licenses for commercial use out there. And since this is all well-trodden terrain (HTTP protocol interactions) there are no patents to worry about either. – Martijn Pieters Nov 05 '14 at 13:53
  • It appears to load the file as a string, but then fails here: buf.write('--' + boundary + '--\r\n\r\n') – code base 5000 Nov 05 '14 at 14:22
  • @MartijnPieters - So I'm going to try requests, can I install these packages without actually using the setup.py or ez install functions? – code base 5000 Nov 05 '14 at 14:50
  • Create a virtualenv and install into there. Or just unpack the source and move the `requests` directory anywhere on your Python path, including the directory the main script lives in. See `python -m site` for the full path. `pip` can install into your user directory, for example. – Martijn Pieters Nov 05 '14 at 15:38

0 Answers0