3

I am running a python web app on an Ubuntu server, while I development locally on OS X.

I use a lot of unicode strings for the Hebrew language, including manipulating filenames of images, so they will be saved on the filesystem with Hebrew characters.

My Ubuntu server is fully configured for UTF-8 - I have other images on the file system (outside of this app) with Hebrew names, in Hebrew named directories, etc.

However, my app returns errors when trying to save an image with a Hebrew filename on Ubuntu (but not on OS X).

The error being:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 0-3: ordinal not in range(128)

After alot of investigating, I got to the last possible cause as far as I can see:

# Inside my virtualenv, Mac OS X
>>> import os.path
>>> os.path.supports_unicode_filenames
>>> True

# Inside my virtualenv, Ubuntu 12.04
>>> import os.path
>>> os.path.supports_unicode_filenames
>>> False

And just for the curious, here are my Ubuntu locale settings:

locale
LANG=en_US.UTF-8
LANGUAGE=
LC_CTYPE="en_US.UTF-8"
LC_NUMERIC="en_US.UTF-8"
LC_TIME="en_US.UTF-8"
LC_COLLATE="en_US.UTF-8"
LC_MONETARY="en_US.UTF-8"
LC_MESSAGES="en_US.UTF-8"
LC_PAPER="en_US.UTF-8"
LC_NAME="en_US.UTF-8"
LC_ADDRESS="en_US.UTF-8"
LC_TELEPHONE="en_US.UTF-8"
LC_MEASUREMENT="en_US.UTF-8"
LC_IDENTIFICATION="en_US.UTF-8"
LC_ALL=

Update: adding the code, and an example string:

# a string, of the type I would get for instance.product.name, as used below.
u'\\u05e7\\u05e8\\u05d5\\u05d1-\\u05e8\\u05d7\\u05d5\\u05e7'


#utils.py
# I get an image object from django, and I run this function so django 
# can use the generated filepath for the image.
def get_upload_path(instance, filename):

    tmp = filename.split('.')
    extension = '.' + tmp[-1]

    if instance.__class__.__name__ == 'MyClass':

        seo_filename = unislugify(instance.product.name)
        # unislugify takes a string and strips spaces, etc.
        value = IMAGES_PRODUCT_DIR + seo_filename + extension

    else:

        value = IMAGES_GENERAL_DIR + unislugify(filename)

    return value

Example stacktrace:

UnicodeEncodeError: 'ascii' codec can't encode characters in position 60-66: ordinal not in range(128)

Stacktrace (most recent call last):

  File "django/core/handlers/base.py", line 111, in get_response
    response = callback(request, *callback_args, **callback_kwargs)

  File "django/contrib/admin/options.py", line 366, in wrapper
    return self.admin_site.admin_view(view)(*args, **kwargs)

  File "django/utils/decorators.py", line 91, in _wrapped_view
    response = view_func(request, *args, **kwargs)

  File "django/views/decorators/cache.py", line 89, in _wrapped_view_func
    response = view_func(request, *args, **kwargs)

  File "django/contrib/admin/sites.py", line 196, in inner
    return view(request, *args, **kwargs)

  File "django/utils/decorators.py", line 25, in _wrapper
    return bound_func(*args, **kwargs)

  File "django/utils/decorators.py", line 91, in _wrapped_view
    response = view_func(request, *args, **kwargs)

  File "django/utils/decorators.py", line 21, in bound_func
    return func(self, *args2, **kwargs2)

  File "django/db/transaction.py", line 209, in inner
    return func(*args, **kwargs)

  File "django/contrib/admin/options.py", line 1055, in change_view
    self.save_related(request, form, formsets, True)

  File "django/contrib/admin/options.py", line 733, in save_related
    self.save_formset(request, form, formset, change=change)

  File "django/contrib/admin/options.py", line 721, in save_formset
    formset.save()

  File "django/forms/models.py", line 497, in save
    return self.save_existing_objects(commit) + self.save_new_objects(commit)

  File "django/forms/models.py", line 628, in save_new_objects
    self.new_objects.append(self.save_new(form, commit=commit))

  File "django/forms/models.py", line 731, in save_new
    obj.save()

  File "django/db/models/base.py", line 463, in save
    self.save_base(using=using, force_insert=force_insert, force_update=force_update)

  File "django/db/models/base.py", line 551, in save_base
    result = manager._insert([self], fields=fields, return_id=update_pk, using=using, raw=raw)

  File "django/db/models/manager.py", line 203, in _insert
    return insert_query(self.model, objs, fields, **kwargs)

  File "django/db/models/query.py", line 1593, in insert_query
    return query.get_compiler(using=using).execute_sql(return_id)

  File "django/db/models/sql/compiler.py", line 909, in execute_sql
    for sql, params in self.as_sql():

  File "django/db/models/sql/compiler.py", line 872, in as_sql
    for obj in self.query.objs

  File "django/db/models/fields/files.py", line 249, in pre_save
    file.save(file.name, file, save=False)

  File "django/db/models/fields/files.py", line 86, in save
    self.name = self.storage.save(name, content)

  File "django/core/files/storage.py", line 44, in save
    name = self.get_available_name(name)

  File "django/core/files/storage.py", line 70, in get_available_name
    while self.exists(name):

  File "django/core/files/storage.py", line 230, in exists
    return os.path.exists(self.path(name))

  File "python2.7/genericpath.py", line 18, in exists
    os.stat(path)
  • How do you save the image? Please provide the part of the code and maybe an example filename that fails. – Fabian Nov 05 '12 at 12:50
  • @pwalsh: Please post `repr(instance.product.name)` on both OSX and Ubuntu, and the full stack trace / error message received on Ubuntu. – unutbu Nov 05 '12 at 13:26
  • @unutbu: On OS X repr gives u'\u05e8\u05d2\u05e2' but on Ubuntu "u'\u05e8\u05d2\u05e2'"! –  Nov 05 '12 at 15:07
  • I have posted an solution to the UnicodeEncodeError part of this problem here: http://stackoverflow.com/a/31001281/3003438 – lukeaus Jun 23 '15 at 11:20

2 Answers2

5

os.path.supports_unicode_filenames is always False on posix systems except darwin, that's because they don't really care about the encoding of the filename, it's simply a byte sequence. The locale settings specify how to interpret this bytes, that's why you can end up with broken characters in a terminal whenn the locale setting isn't right.

How are you running your web app? If your running it through a web server (apache?) using cgi or wsgi, the locale may not be what you see in the shell, so this could be the reason why python tries to use the ascii codec to encode the pathname.

To make it work, you could manually encode the pathname as utf-8 when opening the file.

Edit:
So the fails is a call to os.stat, which, wenn called with an unicode string, tries to convert it to a byte string according to the default encoding (sys.getdefaultencoding()), which within a uWSGI environment always seems to be ascii when using python2. To fix this you can make sure to encode any unicode string to utf-8 before it can be passed on to os.stat.

mata
  • 67,110
  • 10
  • 163
  • 162
  • I an running it on uWSGI, behind nginx. I have tested with uWSGI running as root, and running as my user, with the same results. –  Nov 05 '12 at 13:11
  • @pwalsh you still haven't shown the exact line where the error happens, without that everything here is still just speculation. – mata Nov 05 '12 at 13:17
  • I added a stacktrace to the question. –  Nov 05 '12 at 15:30
  • good find! I cant dig that information up myself anywhere on the uWSGI site or anywhere else. Still when I explicitly encode seo_filename as utf-8 (e.g., value = IMAGES_PRODUCT_DIR + seo_filename.encode('utf-8') + extension) i get the same error. –  Nov 06 '12 at 07:08
  • is one of `IMAGES_PRODUCT_DIR` or `extension` `unicode`? if so, you also have to encode them. – mata Nov 06 '12 at 09:06
0

Thanks to the help of everyone. I still did not solve this issue with uWSGI.

But, this was the last straw in "configuring" uWSGI for me, I went back to gunicorn as the app server and everything works fine. I sure would like to use uWSGI as it is an ambitious project, but at the end of the day I am a developer and not a sys admin, and gunicorn is much easier to just get working in the common use cases.