-1

I want to go through data in my folder, identify them and rename them according to a list of rules I have in an excel spreadsheet I load the needed libraries, I make my directory the working directory; I read in the xcel file (using xlrd) and when I try to read the data by columns e.g. :

fname = metadata.col_values(0, start_rowx=1, end_rowx=None)

the list of values comes with a u in front of them - I guess unicode - such as: fname = [u'file1', u'file2'] and so on

How can I convert fname to a list of ascii strings?

Slater Victoroff
  • 21,376
  • 21
  • 85
  • 144
Dimitris
  • 425
  • 1
  • 9
  • 21
  • 3
    what's the big deal if the strings are in unicode? – Brad Jul 22 '13 at 13:59
  • thanks for the comments / suggestions; I am not sure if the unicode is the problem, but I think this is the problem since the code cannot identify file1, file2 etc in my folder --I believe the error was the presence of u – Dimitris Jul 23 '13 at 14:44

2 Answers2

0

I'm not sure what the big issue behind having unicode filenames is, but assuming that all of your characters are ascii-valid characters the following should do it. This solution will just ignore anything that's non-ascii, but it's worth thinking about why you're doing this in the first place:

ascii_string = unicode_string.encode("ascii", "ignore")

Specifically, for converting a whole list I would use a list comprehension:

ascii_list = [old_string.encode("ascii", "ignore") for old_string in fname]
Slater Victoroff
  • 21,376
  • 21
  • 85
  • 144
  • thank - probably your are right, and unicode is not the problem with the code; I will test and update the code - I will post the results – Dimitris Jul 23 '13 at 14:46
0

The u at the front is just a visual item to show you, when you print the string, what the underlying representation is. It's like the single-quotes around the strings when you print that list--they are there to show you something about the object being printed (specifically, that it's a string), but they aren't actually a part of the object.

In the case of the u, it's saying it's a unicode object. When you use the string internally, that u on the outside doesn't exist, just like the single-quotes. Try opening a file and writing the strings there, and you'll see that the u and the single-quotes don't show up, because they're not actually part of the underlying string objects.

with open(r'C:\test\foo.bar', 'w') as f:
    for item in fname:
        f.write(item)
        f.write('\n')

If you really need to print strings without the u at the start, you can convert them to ASCII with u'unicode stuff'.encode('ascii'), but honestly I doubt this is something that actually matters for what you're doing.

You could also just use Python 3, where Unicode is the default and the u isn't normally printed.

Henry Keiter
  • 16,863
  • 7
  • 51
  • 80
  • thanks - I now believe unicode might not be my problem; I will update the post as soon as I know better – Dimitris Jul 23 '13 at 14:45
  • `f.write(item)` fails if `item` is a Unicode string with characters outside `ascii` (`sys.getdefaultencoding()`). Use `codecs.open()` with explicit character encoding instead. – jfs Mar 01 '14 at 14:54