6

What is this character: â\u0080\u0099 ?

This should be an apostrophe or a single quote.

How can I convert it (using Ruby) to a simple, single quote ' or display it properly in a web page as a single quote?

Thanks

Billy Dunn
  • 121
  • 1
  • 2
  • 6
  • The answer is to be found in this post: http://stackoverflow.com/questions/11972203/special-characters-in-r – Bas Matthee Mar 07 '13 at 12:40
  • Thanks. I did read that post. But it's still not clear to me how to display it as an apostrophe or a single quote to the user. So either I have to do a regular expression search and replace? or some kind of character encoding conversion. – Billy Dunn Mar 07 '13 at 12:56

1 Answers1

9

It is a typographically correct apostrophe, more exactly RIGHT SINGLE QUOTATION MARK' (U+2019) (’), after some munging in incorrect character code conversions or interpretations. It seems to be an UTF-8 encoded form of that character (three bytes, 0xE2 0x80 0x99) incorrectly interpreted as ISO-8859-1 encoded data.

Jukka K. Korpela
  • 195,524
  • 37
  • 270
  • 390
  • Thank you. This post also helped me (especially post by phrogz) http://stackoverflow.com/questions/2572396/nokogiri-open-uri-and-unicode-characters – Billy Dunn Mar 07 '13 at 13:42
  • 1
    BTW, I fixed a file full of junk like this with the command `iconv -f utf8 -t iso-8859-1 < input`, which resulted in actually valid UTF-8 text (I basically performed the inverse of the initial incorrect munge). Just in case anyone else saw this and was trying to repair some text. – Wyatt Ward Apr 28 '21 at 00:30