2

I am using below code to copy text from some control.Please note text could be in Spanish or English.Later i am showing it up inside a rich text box.

Clipboard.Clear();
MyDocBodyControl.Range.Copy();
html = Convert.ToString(Clipboard.GetData(DataFormats.Html));

But when i am displaying them in rich text box,the accented characters are not showing properly.If i am using any other formats like Text,then i am getting proper accented characters.But i have to use HTML formats because i have some styles to be added with the copied text.

Any way to show the accented characters properly with HTML data format ?

cuongle
  • 74,024
  • 28
  • 151
  • 206
Ppm
  • 162
  • 1
  • 13

2 Answers2

1

Set a correct encoding? UTF-8/Unicode/... ? Also have a look on these topics: How to convert a Unicode character to its ASCII equivalent

Community
  • 1
  • 1
juFo
  • 17,849
  • 10
  • 105
  • 142
1

DataFormats.Html specification states it's encoded in UTF-8. But there's a bug in .NET 4 Framework and lower, and it actually reads as UTF-8 as Windows-1252.

You get allot of wrong encodings, which leading to funny/bad characters such as 'Å','‹','Å’','Ž','Å¡','Å“','ž','Ÿ','Â','¡','¢','£','¤','Â¥','¦','§','¨','©'

For example '€' is wrongly encoded as '€' in Windows-1252.

Full explanation here at this dedicated website Debugging Chart Mapping Windows-1252 Characters to UTF-8 Bytes to Latin-1 Characters

But by using the conversions tables you will not loose any UTF-8 characters. You can get the original pristine UTF-8 characters from DataFormats.Html. (Note: Ppm solutions defaults to ASCII on a fail and you loose encoding information!)

Also, Chrome adds Apple-converted-* characters that appear as for example 'Â ' from a clip, but claim to be removed.

Soln: Create a translation dictionary and search and replace.

Markus
  • 420
  • 3
  • 7