1

I've recently designed an upload dialog backed by PyCURL which I'm using in a few of my applications.
I have run into an issue when setting pycurl's HTTPPOST option. I am setting it like so:
self.curl.setopt(self.curl.HTTPPOST, [(field, (self.curl.FORM_FILE, filename))])
If filename is a string, all is fine. If I pass it a unicode, however, it raises a TypeError. Is there any way for me to be able to give it a Cyrillic path? I tried UTF-8 encoding it, but that was unsuccessful. Thank you for your time

Update:

I'm actually getting the filename from a WX control, so it's unicode before I even touch it. When I then encode it to UTF-8, (using filename = filename.encode('UTF-8')) the setopt goes fine but everything blows up on perform:

* About to connect() to example.com port 80 (#0)
*   Trying 123.123.123.123... * connected
* Connected to example.com (123.123.123.123) port 80 (#0)
* failed creating formpost data
* Connection #0 to host example.com left intact
Traceback (most recent call last):
  File "c:\python27\lib\site-packages\transfer_dialogs-0.28-py2.7.egg\transfer_dialogs\transfer_dialogs.py", line 64, in perform_transfer
    self.curl.perform()
error: (26, 'failed creating formpost data')

Update 2:

As requested, a bit more data. filename contains the result of a GetValue() from the open dialog.
logging.debug("Filename: %r encoded filename: %r" % (filename, filename.encode('UTF-8')))
result:
Sat Feb 05, 2011 03:33:56 core.dialogs.upload_audio DEBUG: Filename: u'C:\Users\Q\test\\u0422\u0435\u0441\u0442\u043e\u0432\u0430\u044f \u043f\u0430\u043f\u043a\u0430\test.mp3' encoded filename: 'C:\Users\Q\test\\xd0\xa2\xd0\xb5\xd1\x81\xd1\x82\xd0\xbe\xd0\xb2\xd0\xb0\xd1\x8f \xd0\xbf\xd0\xb0\xd0\xbf\xd0\xba\xd0\xb0\test.mp3'

  • "I tried UTF-8 encoding it, but that was unsuccessful" -- explain HOW you tried, and what was the outcome: gibberish, exception (traceback would help), other). Also: what is a "Cyrillic path"?? A filename/path encoded in e.g. cp1251? What operating system are you using? – John Machin Feb 04 '11 at 23:41
  • I tried with filename = filename.encode('UTF-8'). The line which set the HTTPPOST option then raised: error: (26, 'failed creating formpost data') OS Windows. The path is something like: r"C:\Users\Q\test\Тестовая папка\test.mp3" – Christopher Toth Feb 05 '11 at 00:48
  • Instead of `something_like(filename)`, please use `repr()` to show unambiguously the unicode filename that you got from the wx control. – John Machin Feb 05 '11 at 07:18
  • @John As requested: logging.debug("Filename: %r encoded filename: %r" % (filename, filename.encode('UTF-8'))) result: Sat Feb 05, 2011 03:33:56 core.dialogs.upload_audio DEBUG: Filename: u'C:\\Users\\Q\\test\\\u0422\u0435\u0441\u0442\u043e\u0432\u0430\u044f \u043f\u0430\u043f\u043a\u0430\\test.mp3' encoded filename: 'C:\\Users\\Q\\test\\\xd0\xa2\xd0\xb5\xd1\x81\xd1\x82\xd0\xbe\xd0\xb2\xd0\xb0\xd1\x8f \xd0\xbf\xd0\xb0\xd0\xbf\xd0\xba\xd0\xb0\\test.mp3' – Christopher Toth Feb 05 '11 at 08:39
  • Thanks. No weird chars; only ASCII and Cyrillic. Things to try: write a small stand-alone script that attempts to open the filename 3 times (as unicode, encoded as UTF, encoded as cp1251). What is the "ANSI" encoding on your system? Note that there's a space in the filename. Can the app handle a filename with ASCII characters and a space? Are you sure that the name of the actual file on disk is as received from the wx control? – John Machin Feb 05 '11 at 10:21

2 Answers2

0

Decompose this problem into 2 components:

  1. tell pycurl which file to open to read file data
  2. send filename in correct encoding to the server

These may or may not be same encodings.

For 1, use sys.getfilesystemencoding() to convert unicode filename (which you use throughout python code correctly) to a string that pycurl/libcurl can open correctly with fopen(). Use strace (linux) or equivalent windows osx to verify correct file path is being opened by pycurl.

If that totally fails you can always feed file data stream from Python via pycurl.READFUNCTION.

For 2, learn how filename is transmitted during file upload, example. I don't have a good link, all I know it's not trivial, e.g. when it comes to very long file names.

Community
  • 1
  • 1
Dima Tisnek
  • 11,241
  • 4
  • 68
  • 120
0

Filename should be in UTF-8, and the host you upload it to should support UTF-8 file names. If it supports a different, non-Unicode encoding, try to encode the filename KOI8-R or WIN1251 (but this, of course, is not nice and standards-compliant).

EDIT, having seen the comments: Probably it should have been ur"C:\Users\Q\test\Тестовая папка\test.mp3".encode("UTF-8"). That u bit it important; without it, the Cyrillic letters are taken encoded in your console encoding. I did just try it, and it worked (not upload, just setopt).

9000
  • 39,899
  • 9
  • 66
  • 104