1
d = {
    "key": "Impress the playing crowd with these classic "
           "Playing Cards \u00a9 Personalized Coasters.These beautiful"
           " coasters are made from glass, and measure approximately 4\u201d x 4\u201d (inches)"
           ".Great to look at, and lovely to the touch.There are 4 coasters in a set.We have "
           "created this exclusive design for all card lovers.Each coaster is a different suit, "
           "with the underneath.Make your next Bridge, or Teen Patti session uber-personal!"
           "Will look great on the bar, or any tabletop.Gift Designed for: Couples, Him, "
           "HerOccasion:Diwali, Bridge, Anniversary, Birthday"}

i have tried the replace function on it but didn't work.

s = d[key].replace('\u00a9','')
ajknzhol
  • 6,322
  • 13
  • 45
  • 72
Manjit Kumar
  • 1,221
  • 1
  • 19
  • 28
  • 1
    be aware that `\u00a9` is the copyright symbol. Removing that may have legal consequences. – Adam Smith Apr 28 '14 at 18:21
  • 5
    Concur, removing the characters you cannot understand seems like the wrong way to solve this problem. (The actual text has much more severe problems, but I guess that's out of scope for this site.) – tripleee Apr 28 '14 at 18:22
  • Please explain "but didn't work". What happened? What did you expect to happen? – Jace Browning Apr 28 '14 at 18:23
  • @André write that out as an answer, it seems to be what OP is looking for. – Adam Smith Apr 28 '14 at 18:24
  • @jace - i was expecting the replacement of these characters with '' empty string. andre - let me try your solution. thanks ALL anyways. – Manjit Kumar Apr 28 '14 at 18:26
  • possible duplicate of [What is the best way to remove accents in a python unicode string?](http://stackoverflow.com/questions/517923/what-is-the-best-way-to-remove-accents-in-a-python-unicode-string) – jsbueno Apr 28 '14 at 18:33
  • 3
    You _should not_ remove unicode characters - you hhave to DFEAL with then - please read: http://www.joelonsoftware.com/articles/Unicode.html – jsbueno Apr 28 '14 at 18:34

3 Answers3

10

If you want to remove all Unicode characters from a string, you can use string.encode("ascii", "ignore").

It tries to encode the string to ASCII, and the second parameter ignore tells it to ignore characters that it can't convert (all Unicode chars) instead of throwing an exception as it would normally do without that second parameter, so it returns a string with only the chars that could successfully be converted, thus removing all Unicode characters.

Example usage :

unicodeString = "Héllò StàckOvèrflow"
print(unicodeString.encode("ascii", "ignore")) # prints 'Hll StckOvrflow'

More info : str.encode() and Unicode in the Python documentation.

3
d['key'].decode('unicode-escape').encode('ascii', 'ignore')

is what you are looking for

>>> d = {
...     "key": "Impress the playing crowd with these classic "
...            "Playing Cards \u00a9 Personalized Coasters.These beautiful"
...            " coasters are made from glass, and measure approximately 4\u201d x 4\u201d (inches)"
...            ".Great to look at, and lovely to the touch.There are 4 coasters in a set.We have "
...            "created this exclusive design for all card lovers.Each coaster is a different suit, "
...            "with the underneath.Make your next Bridge, or Teen Patti session uber-personal!"
...            "Will look great on the bar, or any tabletop.Gift Designed for: Couples, Him, "
...            "HerOccasion:Diwali, Bridge, Anniversary, Birthday"}
>>> d['key'].decode('unicode-escape').encode('ascii', 'ignore')
'Impress the playing crowd with these classic Playing Cards  Personalized Coasters.These beautiful coasters are made from glass, and measure approximately 4 x 4 (inches).Great to look at, and lovely to the touch.There are 4 coasters in a set.We have created this exclusive design for all card lovers.Each coaster is a different suit, with the underneath.Make your next Bridge, or Teen Patti session uber-personal!Will look great on the bar, or any tabletop.Gift Designed for: Couples, Him, HerOccasion:Diwali, Bridge, Anniversary, Birthday'
>>> 
Vor
  • 33,215
  • 43
  • 135
  • 193
0

In order to remove characters represented by unicode escape sequences, you need to use a unicode string.

For example,

s = d[key].replace(u'\u00a9', '')

However, as people have mentioned in comments, removing the copyright symbol might be a very bad idea, though it depends on what you're actually doing with the string.

Rob Watts
  • 6,866
  • 3
  • 39
  • 58