5

I'm trying to upload a file, using the requests library to submit a POST.

This works fine:

theFile = { 'LUuploadFile': ("linea.ipa", open(path_to_file, 'rb'), 'application/octet-stream') }
request = requests.post(url, files=theFile)

This throws an error:

theFile = { 'LUuploadFile': ("línea.ipa", open(path_to_file, 'rb'), 'application/octet-stream') }
request = requests.post(url, files=theFile)

The error is very odd:

(   <class 'requests.exceptions.ConnectionError'>,
    ConnectionError(MaxRetryError("HTTPSConnectionPool(host='fupload.apperian.com', port=443): 
        Max retries exceeded with url: /upload?transactionID=... 
    (Caused by <class 'socket.error'>: [Errno 32] Broken pipe)",),),
     <traceback object at 0x100a8e3f8>)

It's not the server, it accepts the filename if I use curl:

curl --form "LUuploadFile=@línea.ipa" http://...
egrunin
  • 24,650
  • 8
  • 50
  • 93
  • I'm guessing `requests` puts the í character as UTF-8 encoded directly into the socket, as part of the `Content-Disposition` header, which is not allowed. Have you tried percent-encoding the filename? – univerio Apr 03 '14 at 22:56
  • @univerio - actually, that's exactly what `curl` does. `requests` encodes it as `filename*=utf-8''li%CC%81nea.ipa` [(rfc5987)](http://tools.ietf.org/html/rfc5987), which the server may not support... – mata Apr 03 '14 at 23:34
  • The difference between the working and not working lines are that the working one refers to `linea`, with a lower-case i, and the once that doesn't work has `línea` with accent mark on the i. The difference isn't very visible on my screen. – khagler Apr 04 '14 at 01:07
  • @mata - are you saying that `curl` uses the percent-encoding, but `requests` doesn't? – egrunin Apr 04 '14 at 07:20
  • @egrunin - no, it's the other way round, `curl` is the one that doesn't encode the filename and sends it as raw utf8... – mata Apr 04 '14 at 11:31

1 Answers1

2

This means that something in the particular server doesn't implement the parsing of Content-Disposition correctly (according to RFC 5987). I can't be more specific than that since there any many "moving parts" to a web application server (for example you might be using nginx + fastcgi + PHP) and any one (or all :)) of those might be broken. You might find this SO thread as well as this page useful, which approaches the issue from the other side (downloading the file with an UTF-8 name), but boils down to the same issue (parsing the "Content-Disposition" header).

For what it's worth requests is doing the "correct" thing (according to the standard), but there isn't really much it can do if some component on the server doesn't follow the standard (or it might not even be on the server - for example there might be a proxy you're passing trough that is causing the issue).

Grey Panther
  • 12,870
  • 6
  • 46
  • 64
  • That's helpful. Combined with the comment above (that `curl` is not escaping) it makes sense now. Too bad I don't work there anymore :) – egrunin Jul 21 '17 at 14:15