encoding error when reading excel file

Question

I want to go through data in my folder, identify them and rename them according to a list of rules I have in an excel spreadsheet I load the needed libraries, I make my directory the working directory; I read in the xcel file (using xlrd) and when I try to read the data by columns e.g. :

fname = metadata.col_values(0, start_rowx=1, end_rowx=None)

the list of values comes with a u in front of them - I guess unicode - such as: fname = [u'file1', u'file2'] and so on

How can I convert fname to a list of ascii strings?

thanks for the comments / suggestions; I am not sure if the unicode is the problem, but I think this is the problem since the code cannot identify file1, file2 etc in my folder --I believe the error was the presence of u — Dimitris, Jul 23 '13 at 14:44

score 0 · Answer 1 · answered Jul 22 '13 at 14:20

I'm not sure what the big issue behind having unicode filenames is, but assuming that all of your characters are ascii-valid characters the following should do it. This solution will just ignore anything that's non-ascii, but it's worth thinking about why you're doing this in the first place:

ascii_string = unicode_string.encode("ascii", "ignore")

Specifically, for converting a whole list I would use a list comprehension:

ascii_list = [old_string.encode("ascii", "ignore") for old_string in fname]

thank - probably your are right, and unicode is not the problem with the code; I will test and update the code - I will post the results — Dimitris, Jul 23 '13 at 14:46

Henry Keiter · Answer 2 · 2013-07-22T14:28:26.233

The u at the front is just a visual item to show you, when you print the string, what the underlying representation is. It's like the single-quotes around the strings when you print that list--they are there to show you something about the object being printed (specifically, that it's a string), but they aren't actually a part of the object.

In the case of the u, it's saying it's a unicode object. When you use the string internally, that u on the outside doesn't exist, just like the single-quotes. Try opening a file and writing the strings there, and you'll see that the u and the single-quotes don't show up, because they're not actually part of the underlying string objects.

with open(r'C:\test\foo.bar', 'w') as f:
    for item in fname:
        f.write(item)
        f.write('\n')

If you really need to print strings without the u at the start, you can convert them to ASCII with u'unicode stuff'.encode('ascii'), but honestly I doubt this is something that actually matters for what you're doing.

You could also just use Python 3, where Unicode is the default and the u isn't normally printed.

thanks - I now believe unicode might not be my problem; I will update the post as soon as I know better — Dimitris, Jul 23 '13 at 14:45
`f.write(item)` fails if `item` is a Unicode string with characters outside `ascii` (`sys.getdefaultencoding()`). Use `codecs.open()` with explicit character encoding instead. — jfs, Mar 01 '14 at 14:54

encoding error when reading excel file

2 Answers2