5

My webpages are served by a script that dynamically imports a bunch of files with

try:
    with open (filename, 'r') as f:
        exec(f.read())
except IOError: pass

(actually, can you suggest a better method of importing a file? I'm sure there is one.)

Sometimes the files have strings in different languages, like

# contents of language.ru
title = "Название"

Those were all saved as UTF-8 files. Python has no problem running the script in command line or serving a page from my MacBook:

    OK: [server command line] python3.0 page.py /index.ru
    OK: http://whitebox.local/index.ru

but it throws an error when trying to serve a page from a server we just moved to:

      157     try:
      158         with open (filename, 'r') as f:
      159             exec(f.read())
      160     except IOError: pass
      161 
      /usr/local/lib/python3.0/io.py in read(self=, n=-1)
      ...
      UnicodeDecodeError: 'ascii' codec can't decode byte 0xe1 in position 627: ordinal not in range(128) 

All the files were copied from my laptop where they were perfectly served by Apache. What is the reason?

Update: I found out the default encoding for open() is platform-dependent so it was utf8 on my laptop and ascii on server. I wonder if there is a per-program function to set it in Python 3 (sys.setdefaultencoding is used in site module and then deleted from the namespace).

Craig McQueen
  • 41,871
  • 30
  • 130
  • 181
ilya n.
  • 18,398
  • 15
  • 71
  • 89
  • `import` usually works to import files. Any reason it doesn't work for you? – Lennart Regebro Oct 14 '11 at 20:03
  • Have a look at [sys.getfilesystemencoding](https://docs.python.org/3/library/sys.html#sys.getfilesystemencoding). On Linux you should ensure that the locale variable `LC_CTYPE` has a sane value as it defines the meaning of filenames and is used by Python as well. `LC_CTYPE` comes from either the environment variable of the same name or is inferred by `LC_ALL`. Running the `locale` command will tell you the current values. – Bluehorn Sep 25 '17 at 12:53

3 Answers3

16

Use open(filename, 'r', encoding='utf8'). See Python 3 docs for open.

Alex Martelli
  • 854,459
  • 170
  • 1,222
  • 1,395
1

You can use something like

with open(fname, 'r', encoding="ascii", errors="surrogateescape") as f:
    data = f.read()

# make changes to the string 'data'

with open(fname + '.new', 'w',
           encoding="ascii", errors="surrogateescape") as f:
    f.write(data)

more information is on python unicode documents

eSadr
  • 395
  • 5
  • 21
1

Use codecs library, I'm using python 2.6.6 and I do not use the usual open with encoding argument:

import codecs
codecs.open('filename','r',encoding='UTF-8')
Flexo
  • 87,323
  • 22
  • 191
  • 272
vieyra
  • 11
  • 1