1

I read the artist of a song from its MP3 tag, then create a folder based on that name. The problem I have is when the name contains a special character like 'AC\DC'. So I wrote this code to deal with that.

def replace_all(text):
  print "replace_all"
  dictionary = {'\\':"", '?':"", '/':"", '...':"", ':':"", chr(148):"o"}

  for i, j in dictionary.iteritems():
      text = text.replace(i,j)

  return text

What I am running into now is how to deal with non-english characters like an umlaout o in Motorhead or Blue Oyster cult.

As you see I tried adding the ascii-string version of umlaout o at the end of the dictionary but that failed with

UnicodeDecodeError:  'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)
ccwhite1
  • 3,625
  • 8
  • 36
  • 47

2 Answers2

3

I found this code, though I don't understand it.

def strip_accents(s):
  return ''.join((c for c in unicodedata.normalize('NFD', s) if unicodedata.category(c) != 'Mn'))

It enabled me to remove the accent marks from the path of proposed dir/filenames.

ccwhite1
  • 3,625
  • 8
  • 36
  • 47
0

I suggest using unicode for both input text and the chars replaced. In your example chr(148) is clearly not a unicode symbol.

Gintautas Miliauskas
  • 7,744
  • 4
  • 32
  • 34
  • So how do I take a string that has a unicode character inside of it and force the entire string to be set to unicode? And does doing that then change to non-unicode chars of the string? – ccwhite1 Feb 08 '11 at 15:11
  • You probably have a simple string (byte/binary string) in a specific encoding, such as ISO-8859-1 or UTF-8. You need to decode from that encoding to Python's unicode data type, like this: `utext = text.decode('utf-8')`. – Gintautas Miliauskas Feb 10 '11 at 07:59