python: Replacing special characters in a string

Question

I read the artist of a song from its MP3 tag, then create a folder based on that name. The problem I have is when the name contains a special character like 'AC\DC'. So I wrote this code to deal with that.

def replace_all(text):
  print "replace_all"
  dictionary = {'\\':"", '?':"", '/':"", '...':"", ':':"", chr(148):"o"}

  for i, j in dictionary.iteritems():
      text = text.replace(i,j)

  return text

What I am running into now is how to deal with non-english characters like an umlaout o in Motorhead or Blue Oyster cult.

As you see I tried adding the ascii-string version of umlaout o at the end of the dictionary but that failed with

UnicodeDecodeError:  'ascii' codec can't decode byte 0xc3 in position 4: ordinal not in range(128)

See also http://stackoverflow.com/questions/3833791/python-regex-to-convert-non-ascii-characters-in-a-string-to-closest-ascii-equival for discussion of a more general solution. — Mikel, Feb 08 '11 at 11:45

score 3 · Accepted Answer · answered Feb 08 '11 at 22:47

3

I found this code, though I don't understand it.

def strip_accents(s):
  return ''.join((c for c in unicodedata.normalize('NFD', s) if unicodedata.category(c) != 'Mn'))

It enabled me to remove the accent marks from the path of proposed dir/filenames.

answered Feb 08 '11 at 22:47

ccwhite1

3,625
8
36
47

score 0 · Answer 2 · answered Feb 08 '11 at 12:04

0

I suggest using unicode for both input text and the chars replaced. In your example chr(148) is clearly not a unicode symbol.

answered Feb 08 '11 at 12:04

Gintautas Miliauskas

7,744
4
32
34

So how do I take a string that has a unicode character inside of it and force the entire string to be set to unicode? And does doing that then change to non-unicode chars of the string? – ccwhite1 Feb 08 '11 at 15:11
You probably have a simple string (byte/binary string) in a specific encoding, such as ISO-8859-1 or UTF-8. You need to decode from that encoding to Python's unicode data type, like this: `utext = text.decode('utf-8')`. – Gintautas Miliauskas Feb 10 '11 at 07:59

python: Replacing special characters in a string

2 Answers2