0

I encountered a problem with the ckeditor regarding escaping umlauts. It is reproducable with the example editor on http://sdk.ckeditor.com/samples/classic.html

When I take the word Bühnenpräsenz and copy-paste it into the editor and click "source" it is converted to <p>Bühnenpräsenz</p>, though I expect the umlauts to be replaced with html entities.

But when I type in Bühnenpräsenz and not copy-paste it, the click on source shows <p>B&uuml;hnenpr&auml;senz</p> which is correct.

Can anyone reproduce this behavior or does know why pasting and typing behaves differently?

globalworming
  • 757
  • 1
  • 5
  • 33
  • propably related http://stackoverflow.com/questions/1929812/how-does-cut-and-paste-affect-character-encoding-and-what-can-go-wrong – globalworming Mar 28 '17 at 08:41
  • it seems that the contents of the clipboard are in a different encoding then when I type into the ckeditor. In my case the the pasted "ü" has 3 bytes, the typed just 2 – globalworming Mar 30 '17 at 05:33

1 Answers1

0

Ok, thing is, the "ü" from the keyboard is a U+00FC while the "ü" pasted is a "u" U+0075 with diaeresis U+0308. String.normalize() solves the issue or unorm.nfkc() (see unorm)

globalworming
  • 757
  • 1
  • 5
  • 33