3

I have text on a website that displays like that: instead of ö

I extracted the text out of the CMS and analysed it's hex values:

  • the ö's that are displays correctly have c3 b6 - UTF-8
  • the ö's that are displayed incorrect have 6f cc 88

I couldn't find out what encoding this is. What's a good way to identify the encoding?

Teetrinker
  • 850
  • 1
  • 15
  • 30

1 Answers1

2

6F is the UTF-8 (ASCII) encoding of "o", nothing spectacular.
CC 88 is the UTF-8 encoding of U+0308, COMBINING DIAERESIS.

You're simply looking at the decomposed form of the o-umlaut. A combining diaereses character should visually be rendered, well, combined with the previous character. If your system doesn't do that, it means it doesn't treat Unicode correctly, and/or the font you have chosen is somewhat broken. Perhaps you have to normalise your strings into the composed Unicode form instead for your system to handle it correctly.

deceze
  • 510,633
  • 85
  • 743
  • 889
  • Thanks! Well, my system is the browser (current version if Firefox, IE - I think in Chrome it worked, not sure anymore). The font in use is a Google Font - so I doubt there's a problem with either the system or the font. // Actually it seems to be the font that is the problem, I found that answer: http://stackoverflow.com/a/19706263/603569 – Teetrinker Jul 11 '16 at 14:25