1

I have a problem with PycURL Version 7.45.1 with python 3.10.4 on Windows 10.

I want to post a file from my computer onto an API. When I run my code (probably) the filepath makes problems because I have one of those pesky "Umlauts" in my name and thus also in my full filepath.

My filepath for an upload file could look like this: "C:\\Users\\NameWith_ö\\folder\\another_folder\\more_folders\\file.pdf"

I already found some answers that PycURL can only work with strings that include only ascii characters or bytestrings so I encode my filepath string to UTF-8 beforehand, which also seems to work because where I prior got the UnicodeEncodeError on the call of curl.setopt(curl.HTTPPOST, files), now with the encoded filepath I get the error message "error (26, ' ')".

Here is the code of the curl call:

def post(self, request_url, custom_headers : list = None, upload_fields : dict = None, upload_files : dict = None, further_directives : dict = None, proxy : dict = None):
        """
        Posts a HTTP POST request and returns a response object.

        :param request_url: The URL that request should point to.
        :type request_url: string
        :param custom_headers: An optional list of costum headers that should be sent with the request.
        :type custom_headers: list
        :param upload_fields: An optional dict of data that should be inputted into targeted upload fields {key = field name: value = input data}.
        :type upload_fields: dict
        :param upload_files: An optional dict of files that should be uploaded to the target site {key = field name: value = dict {key = 'filepath': value = path to file (required), key = 'new_filename': value = new name for uploaded file (optional), key = 'content_type': value = content type (optional, default is 'multipart/form-data')}}.
        :type upload_files: dict
        :param further_directives: An optional dict of further directives that should be included in the request (for list of all possible directives see PLACEHOLDER).
        :type further_directives: dict
        :param proxy: An optional dict of information used to route the request through a proxy {key = proxy port: value = int, key = proxy domain: value = string}
        :type proxy: dict

        :returns: Response object.
        :rtype: Curlified_Response
        """

        self.prepare_new_request()

        body_buffer = BytesIO()
        curl = pycurl.Curl()

        request_url, custom_headers, further_directives, proxy, upload_fields, upload_files = self.encode_strings(request_url, custom_headers, further_directives, proxy, upload_fields, upload_files)

        curl.setopt(curl.URL, request_url)
        if proxy:
            self.inject_kerberos_proxy(proxy['proxy port'], proxy['proxy domain'], curl)#
        if custom_headers:
            curl.setopt(curl.HTTPHEADER, custom_headers)#
        if further_directives:
            for item in further_directives.items():
                self.include_further_directive(item, curl)#
        if upload_fields:
            postfields = urlencode(upload_fields)#
            curl.setopt(curl.POSTFIELDS, postfields)
        if upload_files:
            files = []
            for item in upload_files.items():
                files.append(self.build_file_upload(item, curl))#
            curl.setopt(curl.HTTPPOST, files)
        curl.setopt(curl.HEADERFUNCTION, self.read_header_line)
        curl.setopt(curl.WRITEFUNCTION, body_buffer.write)
        curl.perform()
        self.read_body(body_buffer)
        curl.close()

And here is the method build_file_upload for reference:

def build_file_upload(self, item, curl):
        """
        Gets a tuple of information for a file upload, checks the correctnes of the content and returns the properly built input.
        """

        info_list = []
        for i in item[1].items():
            if i[0] == 'filepath':
                info_list.append(curl.FORM_FILE)
                info_list.append(i[1])
            elif i[0] == 'new filename':
                info_list.append(curl.FORM_FILENAME)
                info_list.append(i[1])
            elif i[0] == 'contenttype':
                info_list.append(curl.FORM_CONTENTTYPE)
                info_list.append(i[1])
            else:
                raise ArgumentsError("No known file upload key was provided! Please revise.")
        if not info_list:
            raise RequiredArgumentsMissing("One or more required parameters for the file to be uploaded are missing! Please revise.")
        if 'content_type' not in info_list:
            info_list.append(curl.FORM_CONTENTTYPE)
            info_list.append(b'multipart/form-data')
        return (item[0], tuple(info_list))

Lastly, here is the method encode_strings in which the given strings are encoded to bytestrings:

def encode_strings(self, request_url, custom_headers, further_directives, proxy, upload_fields = None, upload_files = None):
        """
        Turns all given strings into bytes to be processable by PycURL.
        """

        custom_headers = [] if custom_headers is None else custom_headers
        upload_fields = {} if upload_fields is None else upload_fields
        upload_files = {} if upload_files is None else upload_files
        further_directives = {} if further_directives is None else further_directives
        proxy = {} if proxy is None else proxy

        request_url = bytes(request_url, 'UTF-8')
        for i in range(len(custom_headers)):
            if isinstance(custom_headers[i], str):
                custom_headers[i] = bytes(custom_headers[i], 'UTF-8')
        for k, v in further_directives.items():
            if isinstance(v, str):
                further_directives[k] = bytes(further_directives[k], 'UTF-8')
        for k, v in proxy.items():
            if isinstance(v, str):
                proxy[k] = bytes(proxy[k], 'UTF-8')
        for k, v in upload_fields.items():
            if isinstance(v, str):
                upload_fields[k] = bytes(upload_fields[k], 'UTF-8')
        for k, v in upload_files.items():
            if isinstance(v, dict):
                for key, value in v.items():
                    if isinstance(value, str):
                        upload_files[k][key] = bytes(upload_files[k][key], 'UTF-8')
        
        if len(upload_fields) == 0 and len(upload_files) == 0:
            return request_url, custom_headers, further_directives, proxy
        else:
            return request_url, custom_headers, further_directives, proxy, upload_fields, upload_files

I hope anybody has an idea what the problem could be or even could resolve this issue, really loosing my marbles over this...

  • I don't know a lot about PycURL, but it says "PycURL will accept Unicode strings that contain ASCII code points only". An umlaut doesn't map to ASCII. In Python3, your string (filepath) starts as UTF-8. You convert to a bytestring *from* UTF8. Test if using URLencode helps generate a correct string: https://stackoverflow.com/questions/5607551/how-to-urlencode-a-querystring-in-python – Alan Jun 17 '22 at 11:20
  • Thank you @Alan for your suggestion, sadly this did not work. I ressort to using relative paths now but that's of course just a stopgap solution. – Felix Rösch Jun 18 '22 at 12:59

0 Answers0