0

I'm using glob.glob to get a list of files from a directory input. When trying to open said files, Python fights me back with this error:

UnicodeEncodeError: 'charmap' codec can't encode character '\xf8' in position 18: character maps to < undefined >

By defining a string variable first, I can do this:

filePath = r"C:\Users\Jørgen\Tables\\"

Is there some way to get the 'r' encoding for a variable?

EDIT:

import glob

di = r"C:\Users\Jørgen\Tables\\"

def main():
    fileList = getAllFileURLsInDirectory(di)
    print(fileList)

def getAllFileURLsInDirectory(directory):
    return glob.glob(directory + '*.xls*')

There is a lot more code, but this problem stops the process.

DoTheGenes
  • 197
  • 2
  • 4
  • 22
  • 2
    There is *no* `r` encoding. You are defining a raw string literal, saving you having to use too many backslashes. Your file encoding took care of the `ø`, so you defined a *unicode* value. – Martijn Pieters Jul 25 '13 at 11:29
  • 1
    `"C:\\Users\\Jørgen\\Tables\\"` *also* works. Your editor saved that as UTF-8, the default encoding Python uses when interpreting your source code. – Martijn Pieters Jul 25 '13 at 11:31
  • Please show us your code that produces that error. – Martijn Pieters Jul 25 '13 at 11:31

2 Answers2

5

Independently on whether you use the raw string literal or a normal string literal, Python interpreter must know the source code encoding. It seems you use some 8-bit encoding, not the UTF-8. Therefore you have to add the line like

# -*- coding: cp1252 -*-

at the beginning of the file (or using another encoding used for the source files). It need not to be the first line, but it usually is the first or second (the first should contain #!python3 for the script used on Windows).

Anyway, it is usually better not to use non ASCII characters in the file/directory names.

You can also use normal slashes in the path (the same way as in Unix-based systems). Also, have a look at os.path.join when you need to compose the paths.

Updated

The problem is probably not where you search it for. My guess is that the error manifests only when you want to display the resulting list via print. This is usually because the console by default uses non-unicode encoding that is not capable to display the character. Try the chcp command without arguments in your cmd window.

You can modify the print command in your main() function to convert the string representation to the ASCII one that can always be displayed:

print(ascii(fileList))
pepr
  • 20,112
  • 15
  • 76
  • 139
  • I would prefer to be able to use non ASCII characters like `æ`, `ø` and `å`. The goal is to input a folder path, and manipulate every file of a certain type within that folder. For this to work on other users, I can't guarantee that the path will not contain any "illegal" characters. I did have `#-*- coding: utf-8 -*-` on top, but neither that nor your "coding" suggestion worked. – DoTheGenes Jul 25 '13 at 12:28
  • OK. That is fine if it is not under your control. But you have to guarantee there are no illegal characters anyway. Letters are OK, even the Unicode ones if the system supports Unicode paths. Do you use Python 3 or Python 2? – pepr Jul 25 '13 at 12:32
  • OK. Did the `# -*- coding: ... -*-` line help to remove the error? – pepr Jul 25 '13 at 14:18
  • When I define `di = r"C:\Users\Jørgen\Tables\\"` in the code, the `coding` line works. But when I try to take the folder path as an input everything goes wrong again. – DoTheGenes Jul 25 '13 at 18:45
0

Please also see:

Convert python filenames to unicode and Listing chinese filenames in directory with python

You can tell Python to explicitly handle strings as unicode -- but you have to maintain that from the first string onward.

In this case passing a u'somepath' to os.walk.

Community
  • 1
  • 1
Michael Flyger
  • 321
  • 3
  • 4